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Abstract 



In this thesis, a Bayes linear methodology for the adjustment of covariance matrices is 
presented and discussed. A geometric framework for quantifying uncertainties about 
covariance matrices is set up, and an inner-product for spaces of random matrices 
is motivated and constructed. The inner-product on this space captures aspects 
of belief about the relationships between covariance matrices of interest, providing 
a structure rich enough to adjust beliefs about unknown matrices in the light of 
data such as sample covariance matrices, exploiting second-order exchangeability and 
related specifications to obtain representations allowing analysis. 

Adjustment is associated with orthogonal projection, and illustrated by examples 
for some common problems. The difficulties of adjusting the covariance matrices 
underlying exchangeable random vectors is tackled and discussed. Learning about the 
covariance matrices associated with multivariate time series dynamic linear models is 
shown to be amenable to a similar approach. Diagnostics for matrix adjustments are 
also discussed. 
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Chapter 1 
Introduction 



. . . There is no way, however, in which the individual can avoid the burden 
of responsibiUty for his own evaluations. The key cannot be found that will 
unlock the enchanted garden wherein, among the fairy-rings and the shrubs 
of magic wands, beneath the trees laden with monads and noumena, blossom 
forth the flowers of probabilitas realis. With these fabulous blooms safely in 
our button-holes we would be spared the necessity of forming opinions, and the 
heavy loads we bear upon our necks would be rendered superfluous once and 
for all. 

Bruno de Finetti 

Theory of Probability, Vol 2 

1.1 Introduction 

Often random variables (or unknown quantities) are not independent of one another. 
That is, knowledge of the outcome of a particular random variable effects beliefs 
about other random variables. Clearly the random variables the amount of rain to fall 
this Saturday and the amount of rain to fall this Sunday fall into this category, since 



12 



CHAPTER 1. INTRODUCTION 



13 



knowledge of the amount of rain that fell on Saturday will be very helpful in assessing 
the amount of rain to fall on Sunday. A common way of measuring the amount of 
linear association between two random variables is the covariance between them. 
Given collections of random variables, one may extend the concept of covariance to 
that of the covariance matrix. This is a matrix of numbers containing information 
about the pairwise linear association between variables. This thesis is concerned with 
the modelling and revising of covariance matrices in the light of predictive data. A 
Bayes linear approach will be taken to the problem, and the theory will be illustrated 
with examples for some common statistical problems. 



1.2 Prevision and expectation 

In this thesis, a random quantity is any well-determined real number whose precise 
value is unknown. Beliefs about the "location" of a given random quantity, X, are 
quantified by making a statement about it's prevision (de Finetti 1974, Chapter 3), 
P(X). P{X) is the quantity, x that you would choose in order to minimise the loss, 
or penalty, L, given by 

L = K{x-XY (1.1) 

for some unit of loss, K (de Finetti 1974, Section 3.3.6). Such an approach to the 
quantification of uncertainty was advocated in de Finetti (1974), and is used as a 
starting point for the subjectivist theory of statistical inference developed by Gold- 
stein and others (see for example, Goldstein (1981), Farrow and Goldstein (1993) and 
Goldstein (1994)). 

Unfortunately there is a problem with such a definition of prevision, since one 
must have a linear preference for the incurred loss, L. In fact, the loss should be in 
units of utility, de Finetti was, of course, well aware of this, hence his Digression on 
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decisions and utilities (de Finetti 1974, Section 3.2). However, his "resolution" is not 
really satisfactory, and in any case, one must be extremely careful to avoid circularity, 
since utility is usually defined in terms of probability, something which we wish the 
concept of prevision to supercede. 

Frank Ramsey also recognised the subjective nature of probability, and the prob- 
lem with the usual betting rate definition (non-linear preferences). Ramsey gives his 
own solution to the problem in Ramsey (1931) (or Ramsey (1990)). However, I prefer 
a solution which works with probability currency, essentially by working with tickets 
in a raffle for a single fixed prize. Such a solution is discussed for upper and lower 
probabilities in Walley (1991, Section 2.2). The idea of using the lottery analogy 
for ensuring linear preference is discussed in Savage (1954, Chapter 5). This idea is 
developed and used for a probability currency approach to belief elicitation in Smith 
(1961). I particularly like Smith's "small diamond in a block of beeswax" formulation, 
but as Savage (1971) points out, this would not be a truly "utility- free currency for 
exploring a subject's opinions about the future of a diamond market"! 

Note that these definitions all rely in some way, on the concept of equally likely 
events. Indeed, I strongly suspect that any careful definition of subjective probability 
or prevision must necessarily make exactly such a recourse at some point. 

To make precise the definition of prevision for unbounded quantities requires a 
limiting argument, and one must be careful to ensure that all limits exist. There 
are, of course, many random quantities for which there does not exist a prevision, 
but such quantities are all unbounded. These will not be considered further, and 
so attention will be restricted to strictly bounded quantities, since this thesis is not 
really the place to discus the origin and foundations of prevision. 

For the rest of this thesis it will be assumed that there is a well-defined notion of 
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prevision which obeys the conditions of coherence 



P{X + Y)=P{X)+P{Y) 



(1.2) 



inf(X) < P{X) < sup(X) 



(1.3) 



discussed in Section 3.1.5 of de Finetti (1974). Note that the choice of P{X) is 
not viewed as some kind of decision problem, under a quadratic loss function, but 
as a one-dimensional summary of the random quantity X, with desirable linearity 

properties, such as 



and always of intrinsic interest, irrespective of any decision problem which may or 
may not be present. 

Prevision is simply a primitively defined expectation, E(X), of a random quantity, 
X, and the two concepts usually coincide provided that you are coherent. However, 
the notation E{X) is reserved to mean something which is in practice, the same, but 
conceptually different (a precise definition will be given later). The prevision of a 
vector or matrix of random quantities is the vector, or matrix of previsions. 

1.3 Covariance 

Covariance is a measure of linear association between two random quantities. In this 
thesis, the covariance, Cov{X,Y), between the random quantities X and Y will be 



P{aX + bY + cZ + ■■■) = aP{X) + bP{Y) + cP{Z) + ■■■ 



(1.4) 



defined by 



Coy{X,Y) = P{XY) -P{X)P{Y), VX, F 



(1.5) 
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The notion of covariance easily extends to vectors of random quantities. For two 
vectors of random quantities, X and 1^, the covariance matrix between them, Cov(X, 
Y), is defined by 

Coy{X,Y) = P{XY^) -P{X)P{Yf, yX,Y (1.6) 

The covariance of a vector of random quantities with itself, Cov(X,X), is the co- 
variance matrix for X, and is denoted Var(X). This thesis is concerned with the 
quantification of uncertainty about such matrices, and methods for learning about 
such matrices. 

1.4 Bayes linear methods 

The Bayes linear approach to subjective statistical inference is founded upon de 
Finetti's theory of prevision. The idea that the foundations of statistical inference 
could be based upon concept of revision of prevision was given in Goldstein (1981). 
A more gentle overview of the methodology from a slightly more simplistic viewpoint 
is given in Farrow and Goldstein (1993). Analysis is carried out using only first and 
second order belief specifications. Linear Bayesian methods have been considered 
previously in the literature. Some of the basic ideas and key results can be found in 
Stone (1963) and Hartigan (1969). 

Bayes linear methods require only a specification of prevision for every quantity 
under consideration, and also specifications for the covariance between every pair of 
quantities (in other words, the covariance matrix for the vector of all quantities under 
consideration). For example, if we are interested in a vector of random quantities, B, 
and wish to learn about it using a vector of observable quantities, D, we would form 
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the vector 

P{B) + Cov{B, D)yav{D)-^[D - P{D)] (1.7) 

Note that this is a vector which depends only on our prior specifications and the data. 
In fact, it is the Bayes linear rule for B, using the data, D. In this thesis, it will be 
assumed that all the variance matrices being considered are strictly positive definite, 
and hence invertible. In fact, such a restriction can nearly always be dropped. All 
that one requires is that the specifications for the covariance matrix are coherent; 
namely that the covariance matrix is non-negative definite. The corresponding re- 
sults can usually be obtained, simply by using any generalised inverse in place of an 
inverse. Moreover, the Moore-Penrose generalised inverse is a natural choice, with 
some desirable properties. More complete discussion of such issues can be found in 
Goldstein and Wooff (1995b). 

It is worth pointing out at this point, that the Bayes linear rule is just a projection 
in the space spanned by the random quantities and the observables, with respect to 
expected quadratic loss. Moreover, when the quantities involved in the equations are 
all indicators for events, this projection gives the usual form of conditional probability 



m\o) = (1.8) 



leading to the famous Theorem of Bayes (1763) (or Bayes (1958)), 



Thus, from a foundational perspective, the linear Bayes rule should be regarded as a 
generalisation of Bayes' Theorem, and not as some sort of approximation to it. This 
is the crux of the argument in Goldstein (1981). It is also worth noting that Bayes 
(1763) begins his essay by defining probability. Paraphrasing him, in more modern 
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parlance, he essentially defines the probability of an event to be the expected value 
of it's indicator, as does de Finetti (1974), and as do we. Obviously, he considered 
expectation to be a more primitive concept than that of probability. Conventional 
treatments, which make probability primitive, and define expected values with re- 
spect to probability measures, have perhaps obscured the simplicity of the concept of 
expectation. 

The use of second-order exchangeability as a fundamental modelling assumption 
was first discussed in Goldstein (1986a), but has since been discussed from a founda- 
tional perspective in Goldstein (1994). 

Second order belief structures have many useful properties with respect to linear 
adjustment, and these are discussed in Goldstein (1988a). The particular properties 
associated with exchangeable belief structures are discussed in Goldstein and Wooff 
(1995a), together with the important notion of Bayes linear sufficiency, first outlined 
in Goldstein (1986b), and more recently discussed in the context of an example, in 
Goldstein and O'Hagan (1995). Comparisons of belief structures are discussed in 
Goldstein (1991), and diagnostics for adjustments are given in Goldstein (1988b). 
Graphical summaries of diagnostics are given in Farrow and Goldstein (1995). 

Later, the geometric construction underlying second-order belief structures and 
linear belief adjustment will be given; for now it is sufficient to note that it is highly 
dependent on the covariance specifications made. Consequently, it is important that 
the specified covariances are appropriate. Bayes linear methods for estimating scale 
parameters were considered, both by Stone (1963) and Hartigan (1969), by fitting 
variance parameters linearly on quadratic data. Further, Stone points out that gen- 
uine variance estimation would require fourth order moment specifications. Modifying 
linear Bayes estimates due to information about variance was considered in Goldstein 
(1979) and Goldstein (1983). In some ways, this could be considered a precursor to 
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the work contained in this thesis. 

1.5 [B/D], the Bayes linear programming 
language 

[B/D] is a fully functional interpreted programming language which implements most 
features of the Bayes linear methodology. It is freely available to the academic com- 
munity over the World-Wide Web, via the URL http : //f ourier .dur . ac .uk: 8000/ 
stats/bd/. The program is outlined in WoofF (1992) and Goldstein and Wooff 
(1995b), and documented fully in Wooff (1995b). It provides a framework for belief 
specification and analysis, and facilities for carrying out adjustments using data, and 
producing diagnostic summaries of data adjustments. It also has facilities for produc- 
ing Bayes linear diagnostic influence diagrams, such as those described in Goldstein, 
Farrow, and Spiropoulos (1993) or Goldstein and Wooff (1995b). A tutorial intro- 
duction to [B/D] can be obtained from the sequence of technical reports Goldstein 
(1995), Wooff (1995a) and Wooff and Goldstein (1995). All of the examples in this 
thesis were implemented in [B/D], and reference will be made to the package, where 
this is felt to be appropriate. 

1.6 Revising beliefs about covariance structures 

Quantifying relationships between variables is of fundamental importance in Bayesian 
analysis. However, there are many difficulties associated even with learning about co- 
variances. For example, it is often difficult to make prior covariance specifications, but 
it is usually even harder to make the statements about the uncertainty in these covari- 
ance statements which are required in order to learn about the covariance statements 
from data. Further, a covariance structure is more than just a collection of random 
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quantities, so we should aim to analyse such structures in a space where they live 
naturally. In this thesis, such an approach will be developed and illustrated, based 
around a geometric representation for variance matrices and exploiting second-order 
exchangeability specifications for them. 

In Chapter 2, a methodology will be developed for the modelling and quantifi- 
cation of uncertainty about covariances between random variables. In Chapter 3, 
the geometric representation of covariance matrices is discussed. In Chapter 4, this 
representation is used to enable learning about the covariance structure underlying 
exchangeable random vectors. 

1.7 Covariance estimation for dynamic linear 
models 

In Chapter 5, the suggested approach to covariance estimation is applied to the devel- 
opment of a methodology for the revision of the underlying covariance structures for 
a dynamic linear model, free from any distributional restrictions, using Bayes linear 
estimators for the covariance matrices based on simple quadratic observables. This 
is done by constructing an inner-product space of random matrices containing both 
the underlying covariance matrices and observables predictive for them. Bayes linear 
estimates for the underlying matrices follow by orthogonal projection. 

The method is illustrated with data derived from the weekly sales of six leading 
brands of shampoo from a medium sized cash-and-carry depot. The sales are modelled 
taking into account the underlying demand and competition effects and the covariance 
structure over the resulting dynamic linear model is adjusted using the weekly sales 
data. 
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1.8 Diagnostic analysis of matrix adjustments 

In Chapter 6, methodology for the diagnostic analysis of matrix adjustments is dis- 
cussed. A framework is outlined whereby the a posteriori observed changes in belief 
may be compared to the a priori expected changes in belief for general Bayes linear 
problems. This theory provides a general unified framework for the a priori and a 
posteriori analysis of Bayes linear statistical problems. 

1.9 Distributional Bayesian approaches to 
covariance estimation 

Until recently, most authors have followed a Wishart conjugate prior approach to 
covariance matrix estimation (see for example, Evans (1965), Chen (1979), Haff (1980) 
or Dickey, Lindley, and Press (1985)). This approach, whilst tractable, places severe 
restrictions on the form of the prior distribution (there is only one hyper-parameter 
which expresses uncertainty about the matrix), and requires a multivariate normal 
assumption for the distribution of the residuals. 

More recently, a different approach has been proposed by Leonard and Hsu (1992). 
Essentially, they learn about the logarithm of the covariance matrix using the loga- 
rithm of sample covariance matrices. This solves the positivity problems associated 
with covariance revision, but imposes a tremendous specification burden, for param- 
eters without an intuitive interpretation. Further, they require a joint multivariate 
normal assumption for the elements of the logarithm of the covariance matrix, and 
since they rely on sampling methodology, have serious computational problems for 
large matrices. 

Brown, Le, and Zidek (1994), make further progress: working within a distribu- 
tional Bayesian paradigm, they develop a reasonably fiexible prior over the elements 
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of a covariance structure, and offer interpretations for the parameters that one is 
required to specify. However, this work is still restricted to multivariate Normal like- 
lihoods, and there is a weak restriction on the form of the mean structure for the 
data. 

Covariance matrix adjustment for dynamic linear models is reviewed in West and 
Harrison (1989). For multivariate time series, the observational covariance matrix 
can be updated for a class of models known as matrix normal models using a simple 
conjugate prior approach. However, the distributional assumptions required are ex- 
tremely restrictive, and there is no method which allows data-driven learning for the 
covariance matrix for the updating of the state vector. 

It would seem that those authors who have considered the problem of covariance 
matrix revision have come to the conclusion that it is such a difficult problem that 
they are prepared to make whatever distributional assumptions necessary in order to 
make the analysis as simple as possible. In particular, the sole justification for the 
Wishart conjugate prior approach seems to be that it makes the problem simple and 
tractable. The distributional assumptions made are usually such that expectations 
and conditional expectations have desirable linearity properties which simplify the 
problem. In this thesis, no distributional assumptions are required, but exactly these 
sorts of linearity properties are used as a starting point. 



Chapter 2 

Partial belief specification and 
exchangeability 

2.1 Partial belief specifications 

Given a set of random quantities of interest, and a selection of observable random 
quantities predictive for them, a distributional Bayesian approach would require a 
full joint probability distribution to be specified over all of the variables of interest, 
before any analysis could take place. On the contrary, all that is required for a Bayes 
linear analysis is the first and second moments of that joint distribution, since many 
aspects of the relationships between variables are captured by such specifications. 

The utility of working with first and second order characteristics has long been 
appreciated in other disciplines. For example, in classical mechanics, when summaris- 
ing the properties of a heavy object, one describes it using the position of the centre 
of mass (its first order characteristics) and its moments of inertia (its second order 
characteristics). One could work exclusively with the mass distribution function for 
the object, but often there would be little utility to be gained from doing so. Further, 
working with first and second order characteristics tends to make analyses simple, 
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tractable and robust. This analogy is taken much further in de Finetti (1974). Of 
course, the analogy only goes so far, and second-order characteristics can mean a lot 
more than just covariance specifications (for example, a full distributional Bayesian 
analysis of a statistical problem can be undertaken within this second-order frame- 
work). See Goldstein (1981) and Goldstein (1994) for a more complete discussion of 
the general foundational viewpoint. 

2.2 Exchangeable representations for covariances 

Let Xi, X2, ... be an infinite, second-order exchangeable sequence of random vectors, 
each of length r, namely a sequence for which Xk = {Xik, . . . , Xj-k)"^ , /i = P{Xi), E = 
Vai:{Xi) does not depend on i, and A = Cov(X;i;, Xi), k ^ / does not depend on k, I. 
In other words, the second-order beliefs are invariant under an arbitrary permutation 
of the index, k. Exchangeable sequences are the subjectivists generalisation of inde- 
pendent, identically distributed variables. Of course, they are not independent (or 
even uncorrelated), but they are very well behaved, as the following representation 
shows. 

From the given specification, we may use the second-order exchangeability repre- 
sentation theorem (Goldstein 1986a) to decompose X^ as 

Xk = M + Rk (2.1) 

where M and Rk are vectors of random quantities, and P{Rk) = 0, Cov{M, R^) = 
0, VA;, Cov(il;i;, Ri) = 0, V/c 7^ /, and the vectors Rk = {Ru, ■ ■ ■ , Rrk)^ form a second 
order exchangeable sequence. Here, M may be thought of as representing underlying 
population behaviour, and Rk as representing individual variation. We can now 
see that whilst the quantities, Xk, themselves are not necessarily uncorrelated, they 
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would be uncorrelated if there were no uncertainty about the underlying mean, M 
of the quantities. 

Bayes linear updating for such a representation would be informative for the 
means, and so Var(iV<f) would go to zero, given sufficient data. However, the data is 
not informative for future under a second-order analysis, and so we do not learn in 
any way about the matrix Yar{Rk). Yar{Rk) is the underlying covariance matrix for 
the data, and has an important effect on the way in which we learn about the means. 
A method for quantifying uncertainty about the matrix, Var(i?;i;), is now presented, 
a necessary step on the way to providing a method for learning about such a matrix. 

For the matrix A = (ai, 02, . . . , a„), define 



The vec operator and it's properties are discussed in Searle (1982, Section 12.9). 
Consider the sequence of r^-dimensional vectors 



representing the quadratic products of the residuals. It is assumed that the Yk are 
second-order exchangeable, and the additional specifications S' = Yav(Yk) and A' = 
Cov(l^;i;, 1^;), k ^ / are expressed. Once again, E' and A' should be specified directly. 
Note however, that if a multivariate normal assumption was felt appropriate for the 
Rk, then E' — A' may be inferred from the fourth moments of that distribution. This 
is the subject of the next section. Of course, if one feels that some other distribution 
is more appropriate, then the moments of that distribution may be used. Indeed, if 
one feels up to the task, it is preferable to make the specifications directly. One is 
only constrained by the usual conditions of coherence. 



vecA = (ai'^, 02'^, . . . , a^^)'^ 



(2.2) 



Yk = vec{RkRj) 



(2.3) 
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The exchangeability representation theorem may be used to decompose the vector 
Yk and then re- write the representation in matrix form in the following way. 

RkRj = V + Uk (2.4) 

where V and Uk are r x r random matrices such that P{Uk) = 0, Cov(vecy, vecUk) = 
Cov(vecC/fe, vecC/;) = 0, VA; 7^ I, and the vect/fc form a second-order exchangeable 
sequence. In particular, Var(vecy) = A' and Var(vec[/jfc) = S' — A' is not de- 
pendent on k. Here, V represents underlying covariance behaviour, and Uk rep- 
resents individual variation within the quadratic products of residuals (note that 
P{V) = P{RkRj) =ya.v{Rk)). 

If we observe a sample Xi, . . . , X„ of size n, then the sample covariance matrix 
takes the form 



S = -J2iXn.-X){X^-X)^ (2.5) 

1 " 

-J2{R^-R){Ru,-Rf (2.6) 

-L „ 1 



where Z = (l/n)Yl^^i Zi, 'iZi. Beliefs for the sample covariance matrix S are, 
by (2.6), uniquely determined by representation (2.4). Imagine forming a sequence 
of sample covariance matrices, 51,52,...; each based on n observations. Then the 
covariance structure for the sample covariance matrices takes on the following form. 

Cov(vecy, vec5;t) = Var(vecy) VA; (2.7) 

Cov(vec5fc,vec5;) = Var(vecy) VA; ^ / (2.8) 

Var(vec5,) = Var(vecV) + ^'^^^'^'^ vA; (2.9) 

n 



These results are derived in Appendix A. 
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Observing sample covariances from a sample of size n reduces uncertainty for V, 
the underlying covariance matrix, by linear fitting, but is uninformative for Uk for 
k > n. 

2.3 Automated specification of the residual 
quadratic structure 

Specification of the matrix Var(vecC/;t) is a very unfamiliar problem. However, if 
it is assumed that conditional upon knowledge of V {ie. Var(vecy) = 0), the Rk 
are multivariate normally distributed, then Var(vecC/fe) may be inferred from the 
fourth moments of the multivariate normal distribution with mean vector zero, and 
covariance matrix Var(i?;i;). The multivariate normal distribution is discussed in 
Searle (1982, Section 13.4), and quadratic forms thereof in Searle (1982, Section 



If X = {Xi, X2, X^, X^)^ is a multivariate normal vector such that P(-X) = 0, 
then 



where P and Q are any constant conformable matrices. This is a well known result 
from normal theory, and is discussed in Searle (1971) and Rohde and Tallis (1969). 
From this, one may easily deduce that 



P{X,X2XsX,) = P{X^X2)P{XsX,) + P{X^Xs)P{X2X,) + P{X,X,)P{X2Xs) (2.11) 



This may also be deduced from the fact that the moment generating function is 



13.5). 



Cov{X^ PX,X^QX) = 2Tr[PVar(X)gVar(X)] 



(2.10) 




(2.12) 
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and that 



mx{t) 



(2.13) 



The algebra is a little tedious, but leads to the more general result that previsions 
(expectations) of odd products are zero, and previsions of even products can be 
calculated by forming all possible pairs, then for each combination, form the product 
of the prevision of the pairs, then sum over pairs. A vectorised version of (2.11) is 
required. First some notation for two different "direct products" of matrices is needed. 
The direct product is discussed in Searle (1982, Section 10.7) and it's relationship with 
the vec operator is discussed in Searle (1982, Section 12.9). 

Definition 1 For r x r matrices A (having entry aij in row i, column j) and B 
(having entry bij in row i, column j) the (left) direct product, A^ B of A and B is 
defined to be the x matrix with the element aj^bim in row r{l — 1) + j, column 
r{m — 1) + k. 

Definition 2 For r x r matrices A (having entry aij in row i, column j) and B 
(having entry bij in row i, column j) the star product A-kB of A and B is defined to 
be the x matrix with the element ajkhm in row r{k — 1) + m, column r(l — 1) + j . 

It is worth noting that 



where I^^r is the (r^ry^ vec-permutation matrix. A full review of the definitions of, 
and the relationships between vec, vec-permutation matrices and direct products can 
be found in Henderson and Searle (1981). 

Now, given (2.11), it trivially follows that for the mean zero, MVN vector X, we 
have 



A-kB = Ir,r{A B) 



(2.14) 



Var(vec(XX'^)) = Var(X) (g) Var(X) + Var(X) ★ Var(X) 



(2.15) 
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This is the dispersion matrix for the p-dimensional Wishart distribution with 1 degree 
of freedom, and (2.15) is a special case of the Wishart dispersion result in Hender- 
son and Searle (1979). Equation (2.15) may be regarded as a primitive statement 
about the way in which the fourth-order moments depend on the second-order mo- 
ments, thus weakening the requirement of a multivariate normality assumption. The 
distributional assumption was made only so that the fourth order moments may be 
deduced from the first and second-order moments. Whatever distributional assump- 
tion is made, if the fourth-order centred moments depend only on the second-order 
centred moments, then the function will be symmetric across covariances. Equation 
(2.15) represents one of the simplest symmetric functions on the covariance possible, 
and so the assumption of this form for the dependencies may be natural independently 
of any normality assumption. 

An analogy may be useful at this point. The usual second-order Bayes linear 
adjustment gives results identical to those which would have been obtained using 
a distributional Bayesian approach, together with the assumption of multivariate 
normality. However, the Bayes linear adjustment is used as a natural method for 
updating moments without making any distributional assumptions at all. In the same 
way, I suspect that (2.15) represents a natural way to assign the fourth-order moments 
using only the second-order moments, irrespective of any distributional assumptions. 

In all of the illustrative examples in this thesis, (2.15) was used in order to "de- 
duce" the form of the fourth order residual structure. A [B/D] macro was written 
which automatically formed quadratic products of desired variables, and made the 
residual specifications accordingly. 

It is appropriate to end this section with a major caveat. The specifications made 
for the quadratic (and indeed, any other) covariance structure must necessarily be 
statements of belief. Ultimately, any method which automatically assigns specifica- 
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tions for the fourth-order moments using lower-order moments is nothing more than 
a crude approximation (although there will often be situations where the analysis is 
quite insensitive to the precise specifications at this level). I would say that speci- 
fication of belief for quadratic products (and specification of belief, more generally) 
is an area that should be given more attention. However, the requirement that all 
such specifications must be made in order simply to revise beliefs about covariance 
structures is unreasonable, and is one of the central themes of this thesis. In the 
next chapter, a structure will be developed in such a way that the minimum speci- 
fication required in order to carry out such analyses is vastly reduced, consequently 
diminishing the need to resort to techniques such as those outlined in this section. 

2.4 Bayes linear adjustment for the quadratic 
structure 

Let the sample covariance matrix S, have elements Sij, and the underlying covari- 
ance matrix, V, have elements Vij. Form the vector space, V of all (real) linear 
combinations of the these elements (together with the unit constant, 1). 



V = span{l, Vij, Sij\yi,j} 



(2.16) 



and then define the inner-product, (•, •) : V x V ^ R via 



{x,Y) = P{XY), yx,Y ev 



(2.17) 



Note that this inner-product induces the distance (or loss) function 



d{x, Y) = P{[x - Yf), yx, e V 



(2.18) 
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Now define the observable subspace D CV via 

D = span{l,Sij\yi,j} (2.19) 

Now for any subspace G CV, define the adjusted expectation operator, Eg : V — > G 
to be the bounded linear operator which orthogonally projects Y E V into G. In 
particular, form Eu{Vij), Mi^j. These depend only on the data, and represent the 
Bayes linear rules for the Vij, given the data. Explicitly, using (1.7), (2.7) and (2.9), 

Er,(vecF) = P(vecF) + Cov(vecF, vecS')Var(vecS')-^[vecS' - P(vecF)] (2.20) 
= P(vecF) + Var(vecF) |var(vecF) + Z^^iXHS^j vec[5 - P(F)] 

(2.21) 

Further, if beliefs over Uk are assigned by MVN fitting, using (2.15), this becomes, 

EnivecV) = P(vecy) + Var(vecy) 

f , , , , Var(R) ® Yaiim + Var(i?) ★ \'ar(i?) 1 
X < Var(vecy) + — — — — \ 

xvec[S-F{V)] (2.22) 



They are related to posterior beliefs about the Vij in a way made explicit in Goldstein 
(1994). 

The specifications made, can therefore be used as a basis for a Bayes linear analysis 
of the covariance structures. However, for large matrices, the number of quantities 
involved in the adjustments will be prohibitively large (though simplifications could 
be made by focussing on small subsets of the problem). It would be desirable to 
analyse covariance matrices in a space where they live more naturally, exploiting 
their matrix structure. 



Chapter 3 



Bayes linear matrix spaces 



3.1 Bayes linear inner-products 

At the end of the last chapter, a special case of general construction of belief struc- 
tures was demonstrated. In general, given a collection of quantities of interest, 
B = [Bi, B2, . . .] and a collection of predictive observables, D = [Dq = 1, Di, D2, . . .], 
form the real vector space, V, of all linear combinations of these quantities 



V = span{B U D} 



(3.1) 



and then define an inner product, (•,-):VxV^RonV via 



{X,Y) = P{XY), VX,F e V 



(3.2) 



This induces the norm 



Xf = P(X2), VX G V 



(3.3) 
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which in turn induces the metric 

d{X, Yf = P{[X - Yf), VX, F e V (3.4) 

This is a natural metric to have on a space of random quantities. Note that if all 
quantities have zero prevision, then this inner-product simply corresponds to covari- 
ance. However, the inner-product has not been defined to be covariance, since two 
quantities should not be viewed as being "the same", simply because they have a 
correlation of one. Viewing them as being "the same" requires them to have the 
same prevision, and a correlation of one. Note that there is still a slight subtlety 
associated with the use of this inner-product, since strictly speaking, it is only an 
inner-product over equivalence classes of random quantities whose normed difference 
is zero. However, from a linear perspective, this inner-product captures exactly what 
is required. An introduction to the functional analytic ideas used in this chapter can 
be found in Kreyszig (1978). Where necessary, form the completion of the space V, 
and denote the resulting Hilbert space by H (see Kreyszig 1978, Section 3.2-3). 

For any closed subspace G C "H, define the adjusted expectation operator, Eg ■ 
H ^ G, to be such that, for all Y & T-[, Eg{Y) is the orthogonal projection of Y 
into G. Note then that Eug(Y) = P{Y), \/Y, and so where there is no confusion, 
the subscript is dropped, so that E(F) = P{Y). Note that expectation and adjusted 
expectation are defined entirely in terms of the inner-product, (•,•), and that the 
inner-product is defined in terms of prevision. 

By keeping distinct notation for the two concepts of prevision and adjusted ex- 
pectation, it is at all times clear whether or not we are dealing with a primitively 
defined prevision, or a projection in the relevant Hilbert space. Whilst this is not 
quite so critical in the case of scalar adjustments, it is very helpful for the spaces of 
random matrices which will be constructed in the next section. 



CHAPTER 3. BAYES LINEAR MATRIX SPACES 



34 



Also note that for any X & B, En{X) is the Bayes linear rule for X, given 
the data, (see Section 1.4) since the orthogonal projection of X into D is the linear 
combination of elements of D which minimises expected quadratic loss. Consequently, 



where D is the vector of elements of D. 

3.2 Spaces of random matrices 

In Chapter 2, we saw how beliefs about a covariance matrix may be revised, first 
by forming quadratic products of the scalar quantities, giving rise to the covariance 
matrix, and then specifying covariance matrices over the second-order exchangeable 
decompositions of all of these quantities, and then carrying out adjustment in the 
usual Bayes linear way. However, it is immediately obvious that faced with problems 
where the covariance matrix to learn about is large, the magnitude of belief speci- 
fication and computation required in order to carry out adjustment is going to be 
considerable. It would be desirable to create a framework whereby matrices may be 
analysed in a space where they may be either treated as a whole, or broken down into 
as many components as the belief analyst feels comfortable working with. For small 
problems, or in problems where a great deal of detailed knowledge about the inter- 
action of variables is known, it may well be desirable to work with the components 
of the matrices directly. However, faced with large problems, or problems where it is 
infeasible to elicit such detailed specifications, one may simply wish to make simple 
scalar statements representing uncertainty in the prior or sample matrices, and the 
"interaction" between them. It is perfectly possible to set up a framework where a 
Bayes linear analysis may take place given such limited specifications. 



Ed{X) = P{X) + Cov{X, D)yav{D)-^[D - P{D)] 



(3.5) 
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The representation which will allow us to treat a covariance matrix as a single 
object is now developed. Let 

B = [BuB2,...] (3.6) 

be a collection of random r x r non-negative definite, real symmetric matrices, repre- 
senting unknown matrices of interest. These might, for example, represent population 
covariance matrices. Let 

D=[D,,D2,...] (3.7) 

be another such collection, representing observable matrices (such as sample covari- 
ance matrices). Finally, form a collection of linearly independent r x r constant 
matrices such that Cr(i-i)+j is the matrix with a 1 in the {1,3^^ position, and zeros 
elsewhere, where i and j range from 1 to r and call this collection 

C=[Ci,...,C,2] (3.8) 

This collection of matrices is a basis for the space of constant r x r matrices. Next 
form a vector space 

N = span{B UCUD} (3.9) 

of all linear combinations of the elements of these collections, and define the inner- 
product (over equivalence classes) on J\f as 

(P,Q) = P(Tr(PQT)) yP,QeAf (3.10) 

which induces the norm 

||p||2 = P(||P|||), ypeAf (3.11) 
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which in turn induces the metric 



d{P,QY = P{\\P-Q\\l) \JP,Q^N, 



(3.12) 



where || • \\p denotes the Frobenius norm of a matrix. This is the sum of the squares 
of the elements, or equivalently, the sum of the squares of the eigenvalues. Where 
necessary, form the completion of the space. The complete inner-product space, or 
Hilbert space, is denoted by M.. 

Analogously with the revision of belief over scalar quantities (Goldstein 1981), we 
learn about the elements of the collection by orthogonal projection into closed 
subspaces of Ai spanned by elements of the collection C U D, in order to obtain 
the corresponding adjusted expectations, namely the linear combinations of sample 
covariance matrices which give our adjusted beliefs. 

Note also that the projection into the constant space, Ec, is such that 



and so where there is no confusion, we often drop the subscript to get E((3) = P{Q). 

3.3 Matrix inner-product 

Why choose the inner-product {P,Q) = P{Ti{PQ^j) for the matrix space? Are 
there other inner-products which would be equally appropriate? As for the Bayes 
linear theory for random scalars, all of the theory is developed for a general inner- 
product space, and so one is free to use any inner-product which one feels to be 
appropriate. However, there are foundational reasons for using this inner-product, 
since the induced norm is based on the expectation of a proper scoring rule for matrices 



Ec{Q)=P{Q), yQeM 



(3.13) 
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(Goldstein 1996), and so an argument for using this inner-product can be given which 
is similar to that used in Goldstein (1986b). Further, there is a sense in which the 
inner-product chosen here is the natural extension of the inner-product {X, Y) = 
P{XY) used for scalar quantities. 

Given a vector space, J\f of n x n random matrices, let t/ijj : Af — > Vij be the 
homomorphism which maps the matrix to it's element, for all i and j. For 

example, for any P e A/", (l>ij{P) is the {i, jY^ element of P — a random scalar. is a 
vector space of random scalars, (j)ij{J\f). Define an inner-product (•, : Vij xVij R 
over each of the vector spaces Vij, and henceforth regard them as inner-product 
spaces. Now define a new space Q to be the direct sum of the spaces Vij (direct sums 
are discussed in Section 3.3 of Kreyszig (1978)). 



n n 



Q=ee^.. (3.14) 

The inner-product, [•, •] on Q is uniquely determined by the inner-products on the 
subspaces it is composed of. 

n n 

\P^(l\ = J2J2iPij^Qij)ij^ ^P= {Pn,Pl2,---,Pnn),q= {Qu, Qu, ■ ■ ■ , Qnn) ^ Q (3.15) 

Ignoring the inner-products, the vector space, Q is isomorphic to the vector space J\f, 
via the isomorphism, (f) : Q ^ Af defined via 



(l>{Pn^Pl2, ■ ■ ■,Pnn) 



( Pll Pl2 ■■■ Pin \ 
P2l P22 ■ ■ ■ P2n 

V Pnl Pn2 ■■■ Pnn J 



(3.16) 



If one is prepared to accept (j) as an inner-product space isomorphism from Q, to A/", 
the inner-product {•, •} : A/" x A/" ^ R over A/" is induced. For example, if 0(p) = P, 
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and (j){q) = Q, then 



{P,Q} = [p,q] = j:j:{Pij,qij)ij (3.17) 



But if on each of the scalar spaces, the usual Bayes linear inner-product 

{Pij, qij)ij = PiPijQij), ypij, Qij e Vij (3.18) 

is used, one may deduce that 

{P,Q} = [P,q] (3.19) 

n n 

= T.T.nPijqij) (3.20) 
= P(Tr[PQT]), yP,QeAf (3.21) 

It is important to note that adopting the usual Bayes linear inner-product on the 
scalar subspaces in no way forces us to adopt the inner-product advocated for the 
matrix space. The spaces Q and J\f are only necessarily isomorphic when considered 
purely as vector spaces. The matrix inner-product for J\f is only implied given the 
additional specification that the inner-product spaces Q and J\f are isomorphic. Vec- 
tor and inner-product space isomorphisms are defined in Sections 2.8-8 and 3.2-2 of 
Kreyszig (1978), respectively. 

Note that by viewing the matrix space in this way, many desirable properties of 
the matrix inner-product become apparent, which link matrix and scalar spaces. An 
important property has already been mentioned — projection of a random matrix 
into the constant space gives it's prevision. Also, note that the induced norms are 
consistent. In other words, for any matrix P, 

\\P\\^ = {P,P} = 0^\\p,£ = {p,^,p,^) = (3.22) 
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This is important when it comes to matching up the completions of matrix and 
component scalar spaces. 

If all matrices of interest contain only one non-zero component (all in the same 
position), the inner product becomes {P,Q) = P{PijQij), inducing the distance 
d{P,QY = P{{Pij — QiiY), as for the usual Bayes linear theory for scalar quanti- 
ties. Further, when matrices are decomposed, the different subspaces representing 
different parts of the matrices remain orthogonal, preventing different subspaces from 
influencing one another. The matrix structure is a generalisation of the scalar Bayes 
linear structure, and scalar Bayes linear adjustments can be recovered by decompos- 
ing all variance structures to the one component level. Viewing the matrix space 
as a direct sum of a large number of orthogonal subspaces of matrix components is 
analagous to viewing a probability distribution as an amalgamation of a partition of 
indicator functions for the component events. 

The matrices we are considering do not have to be finite dimensional. All of the 
theory remains valid if we think in terms of representations of random bounded linear 
self-adjoint operators on a (possibly infinite-dimensional) vector space. 

3.4 General Bayes linear representations 
3.4.1 n-step exchangeable collections 

There is a common form of symmetry which often arises amongst ordered vectors 
of random quantities. It is essentially just a slightly weaker concept than that of 
(second-order) exchangeability. The covariance structure is invariant under arbitrary 
translations and reflections of the ordering, and the auto-correlation function becomes 
constant after some distance, n. We will call ordered vectors with this property, 
second-order n-step exchangeable. As we have seen, covariance may be interpreted 
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as an inner-product on a space of random quantities. This same symmetry also 
occurs, under the same sorts of circumstances, for collections of random matrices in 
a random matrix inner-product space. Hence, a concept of n-step exchangeability 
which is sufficiently general that it is also valid for spaces of matrices is required, and 
so the concept is formalised as follows. 

Let {Yjk\yj, k} be a collection of random entities of interest. Also form a maximal 
linearly independent collection of constant entities of the same type, and call this 
collection C = [Ci, C2, . . .]. When dealing with random scalars, C will consist of the 
single scalar, Ci = 1. In Section 3.2, the constant space for a collection of random 
matrices was described. Form the vector space 

V = span{Ci,Yjk\yiJ,k} (3.23) 

so that the random entities are now vectors within this space. Define an inner-product 
(•,•): V X V — > R on V. The inner-product should capture certain aspects of our 
beliefs about the relationships between the elements of V. Form the completion of the 
space V, and denote this Hilbert space by Ti. In such a general Bayes linear space, 
a bounded linear expectation function E(-) : H — > span{C}, is defined such that 
VF e H, E(F) is the orthogonal projection of Y into span{C}, with respect to the 
inner-product (•,•). 

Definition 3 //3n G N such that E(l^fc) = ej Vj, k, and 



= doij \k-l\=0 
= dlij |A; - ^1 = 1 



{Yik, Yji) = d^n-i)ij yij,\k - l\ = n - 1 
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{Yik,Yji) = Cij yi,j,\k-l\ >n (3.24) 

the collection {Yjk\'ij,k} is said to be generalised (second-order) n-step exchangeable 
over k. If n = 1, the collection is said to be generalised (second-order) exchangeable 
over k. 

□ 



3.4.2 Representation for n-step exchangeable collections 

Goldstein (1986a) constructs a general representation for second-order exchangeable 
collections. There is an analagous representation for collections with the weaker 
property of n-step exchangeability, constructed in a similar way. 

Theorem 1 Let {Yjk\yj,k} be generalised second-order n-step exchangeable over k 
with respect to the inner-product (•, •). Then the Yjk may be represented as 

Yjk = Mj + Rjk yj,k (3.25) 



where the Mj and Rjk have the following properties: 

E{Yju) = E(M,.), E(i?,.,) = yj,k (3.26) 

(M,,M,) = {Mi,Y^k)=Cij, {Mi, Rjk) = yi,j,k (3.27) 

{Rik, Rji) = (Yik, Rji) = (Yik, Yji) - Cij Vi, j, k, I (3.28) 

Further, the {Rjk\yj, k} are generalised second-order n-step exchangeable overk, with 
{Rik, Rji) = yi,j, \k-l\> n. 

Proof 
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Let 

-j^ m 

Mim = -Y.Yik Vz,m (3.29) 

Observe that the sequence Mn, Mi2, ... is CauchyMi ie. that {Mik—Mu, Mik—Mn) — > 
as k,l — > oo, which follows directly from the properties of n-step exchangeable 
sequences. Construct the quantity Mi to be the Cauchy limit of this sequence so that 

lim (Mi^, Y) = (Mi, Y) Vi, Wen (3.30) 

Continuity of the inner-product is given in 3.2-2 of Kreyszig (1978). Linearity of 
E(-) gives E{Mim) = Vz, m, and hence applying (3.30) for F G C we deduce 
E{Mi) = Ci. Define Rik via Rik = Xik — Mi Mi, so that E(i?2m) = Mi, m. The other 
properties of the representation follow directly from (3.30). □ 

As for the case of second-order exchangeability, the mean components of the repre- 
sentation, Mj, represent the quantities which may be learned about by linear fitting on 
the data. It is possible to resolve as much uncertainty as is wished about these quan- 
tities given a sufficient number of observations, by such linear fitting. Therefore, the 
n-step exchangeable collection {Yjk\Mj, k] with representation Yj^ = Mj + Rjk Mj, k 
will be said to identify the random quantities Mj, Mj. 

3.4.3 Example 

Consider the following simple time series model for a sequence of observations {Xi, X2, 
...}. 

Xt = Mt + Rt, Mt > 1 (3.31) 
Mt = Mt-, + St, Mt>2 (3.32) 
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Ml 



(3.33) 



Where the random quantities {Ri, Si\yi > 1} have zero prevision, are mutually un- 
correlated, and are such that Vai:{Ri) = r and Var(S'2) = s Vi > 1. Now form the one 
step differences of the observations. 



Applying Definition 3 to the space V = span{l, Xl\yi > 2} with the inner-product 
{X, Y) = P{XY), yX, r e V, we see that the {X;|Vz > 2} are (second-order) 2-step 
exchangeable. Applying Theorem 1, we see that the {Xj'|Vi > 2} identify zero. 

3.4.4 Matrix example 

Reconsider the exchangeable vector example developed in Chapter 2. Imagine forming 
a sequence of sample covariance matrices, {^i, 5*2, . . .}, each based upon n observa- 
tions. Then form the matrix vector space 



X'f — Xt — Xi 



t-i — 



Rt — Rt I + St 



(3.34) 



V = span{RiRi^ , il2-R2^, ■ ■ ■ , 'S'l, ^2, . . .} 



(3.35) 



(each Si is based upon n of the RjRj"^) and impose the inner-product 



{A, B) = P(Tr(AB^)), VA, B e V 



(3.36) 



Complete this space into a Hilbert space, M.. Limit points such as \'\ the under- 
lying covariance matrix, will be added to the space upon completion. It is clear 
that both the residual and sample covariance matrices are generalised (second-order) 
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exchangeable, and that they have exchangeable representations 



RkRk^ — V + Uk 



(3.37) 



and 



Sk = V + Tk 



(3.38) 



where {Tk,Tk) = {Uk,Uk)/n. This is deduced from equations (2.7), (2.8) and (2.9) 
using the consistency of the inner-products on the scalar and matrix spaces. It is 
clear that the sample covariance matrices identify the underlying covariance matrix, 
V. Hence we may learn as much as is desired about this matrix by linear fitting on 
sufficiently many sample covariance matrices. 

3.5 Primitive specification of the matrix 
inner-product 

This thesis is primarily concerned with making specifications for the random matrix 
inner-product by building it up from specifications made for the quadratic scalars 
of which the matrices are composed. In a sense, this is analogous to specifying an 
expectation of a random quantity by breaking it up over a partition of events and 
specifying probabilities over the partition. However, just as there are many advan- 
tages to making expectation primitive, and specifying expectations directly, so there 
are with matrix inner-products. Initially, it may seem difficult to make primitive spec- 
ifications for the matrix object inner-product, simply because it is a very unfamiliar 
problem. Nevertheless a scheme for elicitation based upon graphical modelling of the 
relationships between matrices, followed by quantifications of uncertainty, and pro- 
portions of uncertainty resolved due to knowledge of parent nodes, could be used in a 
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way very similar to that often used for random scalars. In this way, the specification 
burden will be vastly reduced. Given a problem involving just a few (possibly large) 
matrices, all that will be required in order to carry out a basic analysis is a specifica- 
tion for the inner-product between every pair of matrices, rather than between every 
pair of scalars of which they are comprised. 



Chapter 4 

Covariance matrix adjustment for 
exchangeable vectors 

4.1 Introduction 

Consider the problem of learning about the covariance matrix for the r-dimensional 
exchangeable random vectors, {Xk\\/k e N}. As described in Chapter 2, form the 
exchangeable decomposition of the quadratic products of the residual vectors. 

RkRj = V + Uk (4.1) 

Let C be a basis for the constant observable matrices, and let D = [Di, D2, ■ ■ ■] be a 
collection of observable matrices predictive for the matrix, V in (4.1). Then form a 
vector space of random matrices, as described in Chapter 3, 

V = span{C UVLiD} (4.2) 
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and impose the usual matrix inner-product 



{A,B) = P{Tj:{AB'^j) 



(4.3) 



If necessary, form the completion of the space. Denote the resulting Hilbert space by 
M. 

4.2 Decomposing the variance structure 

As a simple example, D might consist only of the sample covariance matrix, S, 
based on n observations. In this case, the adjusted expectation for the "population" 
matrix would be a weighted linear combination of the prior and sample covariance 
matrices. However, by breaking down the sample covariance matrix into its compo- 
nent sub-matrices, one may resolve a greater proportion of our uncertainty about the 
"population" covariance matrix. 

For simplicity, consider the problem of learning about the covariance structure 
induced by representation (2.4) for 2-dimensional vectors. The covariance matrices 
will be 2 X 2. Consider the sample covariance matrix 




(4.4) 



and the corresponding "population" covariance matrix 




(4.5) 



In the notation of the previous chapter, attention could be restricted to 




(4.6) 
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where all 2 x 2 symmetric matrices can be constructed as linear combinations of the 
elements of C*. Using these collections, our adjusted expectation for V given Ds 
(and C) would take the form 

Ec+Ds{V) = (1 - a)P{V) + aS (4.7) 

where a is the coefficient of the orthogonal projection determined by the inner-product 
(4.3). This simple form arises because from (P6) of Goldstein (1988a), 

Ec+d{V) = Ec{V) + En-Ec{D){V) (4.8) 

Also, by (3.38), P{D) = P{S) = P{V), and for the constant space, C, Ec(-) = P(-)- 
Explicitly, the coefficient a takes the form: 



^ (y-p(nv-P(v)) .49. 

{S -P{S),S -P{S)) ^ ' 



ELiE?.inVar(K.,) 



Eti E •.i{nVar(y,,) + Var(t/,,)} 



However, to improve the precision of the estimates, the projection space could be 
enlarged by constructing 



5ii W 5i2 W 
j ' I 5i2 i ' I 522 



(4.11) 



Such a space is termed the individual variance collection. This allows different sample 
covariances to have different weights, if for example, there is higher prior uncertainty 
about some of the variances. Indeed, this may be taken a stage further, by construct- 



*In the last chapter, a basis for all real matrices was suggested. However, if all of the other 
matrices in the problem are symmetric (as will usually be the case) , then it is sufficient to work with 
a basis for only the symmetric matrices. All we require is that for all matrices, M, which are linear 
combinations of matrices in the problem, the constant matrix, P(M) € span{C}. 
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(4.12) 



This last collection is called the full variance collection. This not only allows the 
different covariances to have different weights, but also allows relationships between 
covariances to have an effect on the adjustment. If V is projected into the span of 
Dp and C, then the adjusted expectation for V will correspond precisely with the 
adjustment which would have been obtained using Bayes linear estimation on the 
quadratic products of the residuals in the scalar space. 

Breaking down the population matrix in the same way, we let 



Vi 



Vu 




V12 

Vl2 





V22 



(4.13) 



As the projection space is enlarged, more of the uncertainty about the variance 
structures is resolved, at the expense of doing more work. Generally projection should 
be carried out in as rich a space as is practicable, but for large variance matrices, the 
difference both in computational effort and in effort required for prior specification, 
between adjusting by Ds, Dj and Dp is substantial, so that a subjective assessment 
of the relative benefits of each adjustment must be made. 
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4.3.1 Examination performance 

A simple example, based on data relating to the examination performance of first 
year mathematics undergraduate students at Durham university, is presented. Those 
students who have only one A Level in mathematics are of particular interest, and 
so attention is restricted to these in this account. For illustrative purposes, focus on 
a few key variables, namely a summary of A Level performance {A), performance in 
the Christmas exams (X), and the end of year exam average (E). 
For the exchangeable decomposition of (say) A^, write 

Ak = MA + RA, (4.14) 

and for the exchangeable decomposition of (say) Rai^Rx^^ write 

Ra,Rx,=Vax + Uax, (4.15) 

so that, for example, Vax represents the underlying covariance between the A and 
X variables, and Uax^. represents the residual for the A;*'* observation. Construct the 
"population" and sample covariance matrices: 





f Vaa 


Vae 


Vax 


\ 




^ Saa 


Sae 


Sax 


\ 


V = 


Vae 


Vee 


Vex 




, s = 


Sae 


See 


Sex 






V Vax 


Vex 


Vxx 


) 




\ Sax 


Sex 


Sxx 


) 



A Bayes linear belief net was formed to represent beliefs about the relationships 
between the quadratic products of the residuals (Figure 4.1). Such graphs are dis- 
cussed in Goldstein (1990), and from a more general perspective in Smith (1990). In 
general, the graphs have the property that nodes are generalised conditionally inde- 
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Figure 4.1: A conditional linear independence graph for the mean components of the 
quadratic products of the residuals 

pendent given their parents, where generalised conditional independence is determined 
by a tertiary operator, • 11 •/• which obeys the following conditions. 

• BUC/{C + D) 

• BUC/D ^CUB/D 

• BU{C + D)/E ^ [{B n C/{D + E)} f^{BYL D/E}] 

We say that A is generalised conditionally independent of B given C if and only 
AIL B/C. Goldstein (1990) shows that adjusted orthogonality is a generalised 
conditional independence property. Explicity, 

BYLC/D ^Ec+d{B) =Ed{B) (4.17) 

defines a generalised conditional independence relation, which Smith (1990) refers 
to as weak conditional independence. This is the generalised conditional indepen- 
dence used to define the graph in Figure 4.1. The common variance node of Fig- 
ure 4.1 reflects beliefs about the positive correlation between variances. This node 
does not represent an observable quantity, but is modelled conceptually in order 
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to simplify the graph. Covariances are influenced by the corresponding variances. 
This graph was used to help structure the belief specification over the mean com- 
ponents of the variance structure. First a list ordering on the nodes is chosen: 
{CV,Vaa,Vxx,Vee,Vax,Vae,Vex} {CV denotes the node representing common 
variance). 

The nodes are modelled as linear combinations of an orthonormal basis for the ran- 
dom variables. The coefficients of the combinations obviously form a lower triangular 
matrix with respect to a list ordering on the nodes, so that 



V = AE 



(4.18) 



where V is the vector of list ordered nodes, E is a vector of orthonormal random 
variables, and A is the lower triangular matrix of coefficients. Clearly we have 



Var(V) = AA^ 



(4.19) 



and so A is the Choelesky triangle for the variance matrix. 

By thinking about the uncertainty of nodes, and the contribution to that uncer- 
tainty by the parents of that node, the Choelesky triangle of the covariance matrix 
for the vector of list ordered nodes was specified as follows: 



A 



V 



1 

0.71 
13.26 
7.07 
2.17 
3.25 
6.50 





0.71 




1.08 
1.62 







13.26 


1.08 


3.25 







7.07 


1.62 
3.25 








1.25 











1.86 




\ 






3.75 J 



(4.20) 



For example, the common variance node, CV was assigned a variance of 1, arbitrarily. 
Next, Vaa was assessed to have a variance of 1, and to be such that one half of the 
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standard deviation of Vaa would be resolved due to knowledge of the unobservable 
common variance node. That determines the second row of the matrix A, since the 
coefficients for the modelled orthonormal variables contributing CV and Vaa must 
be the same, and the sum of their squares must be 1. Next, the quantity Vxx was 
assessed to have a standard deviation of 18.75, and to be such that one half of it's 
variance would be resolved due to knowledge of the common variance node, thus 
determining the third row. Similar specifications were made in order to determine 
the rest of the matrix. 

This implies the covariance matrix, M = AA^: 



/ 


1 


0.71 


13.26 


7.07 


2.17 


3.25 


6 


50 \ 




0.71 


1 


9.38 


5.00 


2.30 


3.44 


4 


59 




13.26 


9.38 


351.56 


93.75 


43.06 


43.06 


129 


17 




7.07 


5.00 


93.75 


100.00 


15.31 


34.44 


68 


89 




2.17 


2.30 


43.06 


15.31 


8.59 


8.79 


17 


58 




3.25 


3.44 


43.06 


34.44 


8.79 


19.34 


26 


37 


V 


6.50 


4.59 


129.17 


68.89 


17.58 


26.37 


77 


34) 



(4.21) 



An alternative approach to the specification of a covariance matrix is given in Garth- 
waite and Dickey (1992). It may be instructive to look at the induced correlation 
matrix, M. 
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0.71 


0.71 


0.71 


0.74 


0.74 


0.74 \ 




0.71 
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0.5 


0.5 


0.78 


0.78 


0.52 




0.71 


0.5 


1 


0.5 


0.78 


0.52 


0.78 




0.71 


0.5 


0.5 


1 


0.52 


0.78 


0.78 




0.74 


0.78 


0.78 


0.52 


1 


0.68 


0.68 




0.74 


0.78 


0.52 


0.78 


0.68 


1 


0.68 


V 


0.74 


0.52 


0.78 


0.78 


0.68 


0.68 


1 / 



CHAPTER 4. EXCHANGEABLE VECTORS 



54 



Also consider the inverse of the covariance matrix: 



/ 


4.00 


-1.41 


-0.08 


-0.14 













-1.41 


5.00 


0.08 


0.15 


-0.98 


-0.65 







-0.08 


0.08 


0.01 


0.01 


-0.05 





-0.02 




-0.14 


0.15 


0.01 


0.05 





-0.07 


-0.03 







-0.98 


-0.05 





0.64 













-0.65 





-0.07 





0.28 





V 








-0.02 


-0.03 








0.07 



(4.23) 



Table 4.1 shows the zero and non-zero elements for the lower triangle of (□ 
represents a non-zero element). It can be seen that the zeros in the inverse covari- 
ance matrix correspond to the adjusted orthogonalities represented by the graph. For 
example, the zero in row Vex, column Vaa, means that Vex and Vaa are orthogo- 
nal after adjusting by the other variables. Such properties of graphical models and 
covariance matrices are discussed in Whittaker (1989). 
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□ 
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Table 4.1: The conditional independences implied by Figure 4.1 

Specifications are also required over the residual components of the variance 
structure. These specifications are more difficult to make, since we are not used 
to thinking about such quantities. In this example, for simplicity, belief specifica- 
tions over the residual structure were chosen to be consistent with those imposed 
under a multivariate normal specification corresponding to the prior specifications 
over the elements Rik, as discussed in Section 2.3. With respect to the ordering 
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{Vaa, Vxx, Vee, Vax, Vae, Vex}, the residual covariance matrix, A^, took the form 



N 



I 127 496 248 251 178 351 \ 

496 19997 5626 3150 1670 10607 

248 5626 6332 1182 1254 5969 

251 3150 1182 1046 599 1949 

178 1671 1254 599 573 1477 

V 351 10607 5969 1949 1477 8439 / 



(4.24) 



Having made specifications over the quadratic products of residuals, beliefs over all 
relevant covariance matrices are determined. 

From the sample covariance matrix, S = Ds, construct the individual variance 
collection, Dj (6 objects) and the full variance collection, De (36 objects), as well 
as the individual collection for the mean structure, Vi (6 objects). Form the random 
matrix space, A4 over all these objects, and investigate adjustments in this space. 



4.3.2 Quantitative analysis 



The prior covariance matrix was specified directly as follows: 



E{V) 



I 7.98 11.14 15.75 \ 

11.14 56.26 53.04 
^ 15.75 53.04 100.00 y 



(4.25) 



This matrix was specified using a graphical model, and a variance component ap- 
proach, as discussed for the quadratic structure in the last section. The sample 
covariance matrix (34 cases) is: 



/ 8.28 20.15 24.75 \ 

20.15 178.30 160.74 
V 24.75 160.74 258.26 / 



(4.26) 



Note that the sample covariance matrix is not too far from the prior specification. 
The adjusted matrices were formed as the appropriate linear combinations of the 
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observables, as described in Section 3.2, and derived explicitly for the simplest case 
in Section 4.2. 





( 8.08 


14.08 


18.69 \ 




En (V) = 


14.08 


96.08 


88.18 


(4.27) 




^ 18.69 


88.18 


151.65 y 






/ 8.04 


15.96 


17.72 \ 




^dAV) = 


15.96 


98.90 


78.63 


(4.28) 




17.72 


78.63 


159.21 j 






/ 8.30 


15.43 


20.06 \ 






15.43 


92.04 


80.66 


(4.29) 




20.06 


80.66 


156.79 j 





In fact, these matrices are the observed values of Ec_|_£)g(y), Ec+£)^(V) and ¥jc+Dpiy) 
respectively, but the C is dropped from the notation, and assumed implicitly to be 
included in the projection space. These adjusted matrices may be used as a basis 
for assessing our posterior beliefs about the matrix object (see Goldstein 1994 and 
Goldstein 1996). They represent prior inferences for posterior judgements. 

Note that the last matrix (4.29) represents the adjusted matrix which would have 
been obtained using a standard Bayes linear analysis on the quadratic products of the 
residuals. In this particular example, all adjusted matrices are positive definite. In 
general, we view negative eigenvalues in the revised structure as providing diagnostic 
warnings of possible conflicts between prior beliefs and the data, or as warning of 
inappropriate model choice or selection of projection space. 

It is desirable to be able to compare the estimates of V: ED^(y), E£)g(y), and 
^Di{y). Thus, the standard interpretive and diagnostic features of the Bayes linear 
methodology are used to assess the model and understand the adjustments taking 
place. 

Goldstein (1991) develops a formal framework for the comparison of covariance 
structures. Essentially, one focusses attention on the eigenstructure of the transfor- 
mation which maps one covariance structure to the other. The transformation which 
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accomplishes this is known as the belief transform, and it's eigenstructure contains all 
necessary information about the adjustment. Further details are given in Goldstein 
and Wooff (1994). 

Quantitatively, given any two covariance matrices, R and S, the belief transform 
is the linear transformation, T such that 

TR = S (4.30) 

Since all of the matrices are strictly positive definite, we can compute T as 

T = SR-^ (4.31) 

The eigenvalues of this matrix are those quoted in Tables 4.2, 4.3, 4.4 and 4.5. How- 
ever, the eigenvectors quoted are normalised with respect to the second matrix, in 
order to make interpretation easier. A discussion of this, and other issues which arise 
in the case where the matrices are not of full rank is given in Goldstein and Wooff 
(1995b). 



Variable 


Primary eigenvector 


Secondary eigenvector 


A 


0.12 


0.08 


E 


-0.11 


0.11 


X 


-0.01 


-0.12 


Eigenvalue ratio 


1.81 


1.44 



Table 4.2: Eigenstructure of the belief transform for the mapping from E{V) to 

Looking first at Table 4.2, we can see that for the first adjustment, variance was 
inflated by a factor 1.81 in a direction close to the difference between A and E, and 
that variance was inflated by a factor of 1.44 in a direction close to A + E — X . 
The other component had eigenvalue close to one. Table 4.3 shows that when Dj is 
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Variable 


Primary eigenvector 


A 


-0.11 


E 


-0.10 


X 


0.0 


Eigenvalue ratio 


1.51 



Table 4.3: Eigenstructure of the belief transform for the mapping from EDg{V) to 



Variable 


Smallest eigenvector 


A 


-0.16 


E 


-0.10 


X 


0.10 


Eigenvalue ratio 


0.81 



Table 4.4: Eigenstructure of the belief transform for the mapping from E^^ (V) to 



Variable 


Primary eigenvector 


Secondary eigenvector 


A 


0.25 


0.10 


E 


0.13 


-0.05 


X 


-0.09 


-0.06 


Eigenvalue ratio 


1.81 


1.62 



Table 4.5: Eigenstructure of the belief transform for the mapping from E[V) to 
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added to the adjustment, variance is inflated by a factor of 1.51 in a direction close 
to A + E. Other directions had eigenvalues close to one. Table 4.4 shows that adding 
the whole of Dp to the adjustment actually reduces variance by a factor of 0.81 in a 
direction close to A + E — X (other eigenvalues close to one). Table 4.5 shows the 
overall adjustment transformation. The overall transform is the composition of the 
three partial transforms (Goldstein 1991). Variance has been inflated by a factor of 
1.81 in a direction close to 2A + E, and by a factor of 1.62 in a direction close to 
2A — {E + X). Examination of the belief transforms in this way allows interpretive 
analysis of the changes in belief. 

4.4 Bayes linear influence diagrams for matrix 
objects 

Figure 4.2 shows a Bayes linear influence diagram representing the adjustments and 
corresponding diagnostic information for the random matrices. Such diagrams are de- 
scribed in detail in Goldstein, Farrow, and Spiropoulos (1993) for random quantities, 
with a similar interpretation for random matrices, where adjusted orthogonality is de- 
termined instead by the inner-product (3.10), so that our conditional independence 
relation, (4.17), becomes 



This is a generalised conditional independence property, as deflned in Smith (1990), 
and consequently, all of the usual properties of conditional independence graphs based 
upon such a relation will hold. Each node represents a space of covariance matrices. 
The outer shadings of the V node represent proportions of uncertainty about V re- 
solved by projection into the various spaces. Shadings start at 3 o'clock, and progress 




(4.32) 
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Figure 4.2: Diagnostic influence diagram summarising changes in expectation of the 
matrix objects 
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in an anti-clockwise fashion. The full circle represents the total uncertainty about the 
values of the space of covariance matrices. The first outer portion shaded represents 
the proportion of our uncertainty resolved by the sample covariance matrix alone 
(Ds). By comparing this with the first shaded portion for the V/ node, it can be seen 
that considerably more has been learned about the matrix object, than about the 
6-dimensional space over the individual variance collection. 

The next shading gives the additional information gained by using the individual 
collection as the projection space. This tells us a great deal more about the elements 
of the Vi collection, but little about the matrix object as a whole. The other shading 
shows the additional uncertainty resolved due to including the full variance collection 
in our projection space. There is information to be gained by enriching our projection 
space, but one must balance information gained with extra effort involved. Whether or 
not one chooses to include the complete variance collection will depend upon the size 
of the problem under consideration, and upon how much the answer really matters. 

It is no coincidence that the total amount of uncertainty resolved for the V and 
Vi nodes is the same in the case of fully decomposed structures. This is essentially 
because the V node represents the heart of the belief transform for the adjustment of 
the Vi node in this case (Goldstein 1990). This is mentioned only in passing, since it 
is of little relevance to the rest of the thesis. See Goldstein (1990) for a discussion of 
these kinds of properties of exchangeable adjustments. 

Shadings in the centres of the nodes are diagnostics based on the size and bearing 
of the adjustments, as described in Goldstein (1988b). Diagnostics for matrix adjust- 
ments are discussed more fully in Chapter 6. For now it suffices to mention that the 
more red shading present, the larger the diagnostic warning. The diagram shows a 
lot of shading for the adjustment of the V matrix, indicating a contradiction between 
the data and our prior beliefs. 
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4.5 Summary 

Analysing matrices in a space where they live naturally not only has great aesthetic 
appeal, but is very powerful and illuminating in practice. Working in this space 
simplifies the handling of large matrices, by reducing the number of quantities involved 
and summarising effects over the whole covariance structure. For the same reasons, 
diagnostic information about adjusted beliefs is easier to interpret. Structures may 
be decomposed as much or as little as desired. 

This approach allows learning for collections of covariance structures, and exam- 
ination of their relationships. It generalises the "element by element" approach to 
revision, which can be viewed as taking place in a subspace of the larger space. Ex- 
changeability representations lie at the heart of the methodology: all specifications 
are over observables, or quantities constructed from observables, rather than artificial 
model parameters, and no distributional assumptions for the data or the prior need 
be made. 



Chapter 5 

Covariance matrix adjustment for 
dynamic linear models 

5.1 Introduction 

The approach to covariance estimation is now applied to the development of a method- 
ology for the revision of the underlying covariance structures for a dynamic linear 
model, free from any distributional restrictions, using Bayes linear estimators for the 
covariance matrices based on simple quadratic observables. This is done by con- 
structing an inner-product space of random matrices containing both the underlying 
covariance matrices and observables predictive for them. Bayes linear estimates for 
the underlying matrices follow by orthogonal projection. 

The method is illustrated using data derived from the weekly sales of six leading 
brands of shampoo from a medium sized cash-and-carry depot. The sales are mod- 
elled taking into account the underlying demand and competition effects, and the 
covariance structure over the resulting dynamic linear model is adjusted using the 
weekly sales data. 

Covariance matrix adjustment for dynamic linear models is reviewed in West and 
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Harrison (1989). For multivariate time series, the observational covariance matrix 
can be updated for a class of models known as matrix normal models using a sim- 
ple conjugate prior approach. However, the distributional assumptions required are 
extremely restrictive, and it is difficult to learn about the covariance matrix for the 
updating of the state vector. 

5.2 The dynamic linear model 
5.2.1 The general model 

Let Xi, X2, ■ ■ ■ be an infinite sequence of random vectors, each of length r, such 
that Xt = (Xit, X2t, ■ ■ ■ , Xrt)^ . These vectors represent the observations at each 
time point. Suppose that we model the relationships between these vectors in the 
following way. 

Xt = F'^@t + ut (5.1) 
@t = G@t.i+u:t (5.2) 

The prior second-order specification is as follows: 

E{ut) = E{u:t) = 0, Var(0o) = S, Yav{ut) = V, Yav{u:t) = W, (5.3) 

Cov(0„ ut) = Cov{us, u^t) = Vs, t, Cov(0„ u)t) = Vs < i (5.4) 

Cov(a;s, oJi) = Cov(i>'s, i/t) = Ms^t (5.5) 

The state vector, @t is p dimensional, and the pxr and pxp dimensional matrices, F 
and G are assumed to be known. This is a second-order description of the (constant) 
multivariate time series dynamic linear model (DLM) described in West and Harrison 
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(1989). No distributional assumptions are made for any of the components in the 
model. Ways to learn about V and W from data are now described. West and 
Harrison (1989, Chapter 15) give a conjugate prior solution to the problem of learning 
about V for a class of these models known as matrix normal models, if one is prepared 
to make the necessary distributional assumptions. However methods for learning 
about the matrix W tend primarily to be ad hoc. The standard method for updating 
W is the discount factor approach, outlined in West and Harrison (1989), which 
simply inflates the W matrix at each iteration, so that the prior is swamped by the 
data at a given rate. Such an approach fails to utilise the fact that there is information 
about the W matrix present in the observations. 

5.2.2 Example 

As an illustration of the approach, consider a simple locally constant model for the 
sales of 6 leading brands of shampoo from a medium sized cash-and-carry depot. 
As above Xi,X2, ... is a sequence of random vectors, each of length 6, such that 
Xt = {Xit, X2t, ■ ■ ■ , Xet)'^ . The component X^ represents the (unknown) sales of 
brand i at time t, simply measured as a number of bottles. The vectors of sales are 
modelled as follows 

Xt = Qt + t^t yt (5.6) 

where 

@t = &t^,+u^t Vt (5.7) 

Prior beliefs are are given by (5.3), (5.4) and (5.5). Here it has been assumed that 
the process is locally constant, but with different underlying demands for each of 
the components of the series. This is a simple model, with no seasonal component, 
chosen to illustrate our methodology, and would be unrealistic if there were noticeable 



CHAPTER 5. DYNAMIC LINEAR MODELS 



66 



trends within any of the components of the series. However, for high dimensional 
time series with no obvious trends, it is often the case that, provided the covariance 
structure is appropriate, many of the interesting features of the series can be captured 
using just such a model. To this end covariances are introduced between components 
of the state vector and also for the way demand changes over time, and for the 
way observations vary from the underlying demand. A more detailed treatment of 
multivariate sales forecasting within a fully specified Bayesian framework is given 
by Queen, Smith, and James (1994) and Queen (1994) who consider the problem of 
developing a dynamic model for multivariate sales, and the development of a prior 
distribution with sufficient flexibility to capture the effects of market interaction. 
These methods are based upon the dynamic graphical model ideas discussed in Queen 
and Smith (1992) and Queen and Smith (1993). 

The second-order DLM requires the following quantiflcations. Firstly, the F and 
G matrices must be specified. Then, a priori specifications are needed for the ex- 
pectation of the initial state vector, /Xq = E(0o). Finally, specify the matrices 
E = Var(0o), V = Var(i/t), = Var(a;t)Vt. 

In the example, the specification for the mean vector was 



The following specifications were made for the covariance matrices, using exchange- 
ability judgements concerning the way in which the observations vary from their 
means. 



E(0o) = (10,9,9,8,8,7)^ 



(5.8) 



E 



/933333\ 
3 9 3 3 3 3 
3 3 9 3 3 3 
3 3 3 9 3 3 
3 3 3 3 9 3 

V333339y 



(5.9) 
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W 



/ 4 1 1 1 1 1 \ 
14 1111 
114 111 
1114 11 
11114 1 

V 1 1 1 1 1 4 / 



(5.10) 



V 



I 36 -4 -4 -4 -4 -4 \ 

-4 36 -4 -4 -4 -4 

-4 -4 36 -4 -4 -4 

-4 -4 -4 36 -4 -4 

-4 -4 -4 -4 36 -4 

\ -4 -4 -4 -4 -4 36 / 



(5.11) 



For example, for the matrix, E, it was decided to be appropriate to associate a stan- 
dard deviation of 3 with each of the variables. A correlation between variables of 1/2 
was also felt appropriate. In truth, there is perhaps more symmetry in these specifi- 
cations than is really appropriate, but specification is hard, and viewing variation in 
the sales of the various shampoos as second-order exchangeable greatly reduces the 
number of specifications which have to be made over the second order structure, and 
will allow further exchangeability modelling to simplify the fourth order specifications 
in later sections. 

Notice however that many aspects of the underlying mechanisms have been cap- 
tured by these specifications. In this model, ©t represents the vector of demands at 
time t. From the positive correlations in Var(0o), if the mean of one product turned 
out to be higher than anticipated, we would revise upwards beliefs about the means 
of the other products. Also, the positive correlations within VBiioJt) indicate that 
there is a common component to the demands, whilst the negative correlations within 
Var(i/t) indicate that brands are competing, and tend to succeed at the expense of 
one another. 
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5.2.3 Bayes linear analysis 

With the second-order specification that has been made, sales data may be used to 
carry out a Bayes linear analysis which will be informative for the mean of future 
observations. However, learning about the covariance matrices W = Var(t(;t) and 
V = Var(i/() will not occur. The methods of the previous chapters will now be 
adapted, in order to enable such learning. 

5.3 Quadratic products 

5.3.1 Exchangeable decomposition of unobservable 
products 

For the general DLM outlined in Section 5.2.1, form the quadratic products of ujt and 
Ut, namely \-ec{u: t(jtyt^) and \-ec{utUt^). We view \-ec{u:tu:t^) and \-ec{uti't^) to be 
second-order exchangeable over t. Explicitly, second-order beliefs over the vectors of 
quadratic products of residuals will remain invariant under the action of an arbitrary 
permutation of the t index (this is what I mean when describing a DLM as constant). 
As described in Section 2.2, using the second-order exchangeability representation 
theorem (Goldstein 1986a), an element of a second-order exchangeable collection of 
vectors may be represented as the sum of a mean vector, common to all elements, and 
a residual vector, uncorrelated with the mean vector and all other residual vectors. 
This representation may be applied to vec(aJiaJt^), and then re- written in matrix 
form as 

LJtU^t^ = V'^ + S'^ Vt>l (5.12) 
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as for (2.4), where and are random matrices of the same dimension as u:t<JiJt^ , 
P(vec(5,")) = Cov(vec(y'"), vec(5n) = 0, Vi (5.13) 

and 

Cov(vec(5i"), vec(5,"^)) = 0, Vs ^ t (5.14) 
Var(vec5,^) = Var(vec5n, Vs,t (5.15) 
Decomposing vec(i>'ti>'t^) similarly gives 

UtuJ = V'' + S;^ yt>l (5.16) 

with properties as for representation (5.12). Note that P(y^) = P(a?ta?t^) = Var(a;t) 
= W and so learning about will allow learning about the covariance matrix for the 
residuals for the state, and PCV) = P{L'ti/t^) = Var(f j) = V, and so learning about 
V will allow learning about the covariance matrix for the observational residuals. 
Representations (5.12) and (5.16) decompose uncertainty for oJt^t^ and UtU^ into 
two parts. Bayes linear updating (with enough data) will eliminate the aspects of 
uncertainty derived from uncertainty about and V . 

In order to conduct a Bayes linear analysis on the quadratic structure additional 
covariance specifications Var(vecy^), Var(vecy), Var(vecS'j^) and VwiyecS'^), for 
some t are needed. 

5.3.2 Example 

In the example, the Xt vector is 6-dimensional, and so the matrices, , , 
and S" are 6 x 6-dimensional. Consequently, the matrices Var(vecy"), Var(vecy'^), 
Var(vecS'^) and Var(vecS'^'') are 36 x 36-dimensional. When referring to the compo- 
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nents of Var(vecy^), the notation vfji^i will be used to denote the covariance between 
the {i^jY^ and {k, elements of V" . Similar notation is used for Var(vecy'^). Also 
s^jf^i and s\jf.i are used for the components of Var(vecS'^) and Var(vecS'j'^) respectively. 
The following covariance specifications were made for our example: 

= 9/4, Vi, = 9/16, Vz ^ J, = 1/5, Vi ^ j, (5.17) 

= 25, Vz, v'i^^^ = 1, Vz ^ J, = 4, Vz ^ J, (5.18) 

= 30, Vz, = 15, Vz ^ J, 4,, = 2500, Vz, = 1000, Vz ^ j. (5.19) 

For instance, vf^^^ is the variance specification for the {i,iy^ element of , which 
represents the underlying variance of the z*'* element of (Vf From (5.10), it has ex- 
pectation 4. From (5.7), this value governs the rate of change of 0^. By considering 
the range of plausible variances for the way @t might change over time, it was felt 
reasonable that a standard deviation specification of 3/2 should be made. The other 
specifications were made in a similar fashion. For simplicity in this example, the spec- 
ifications for s'^jj^i and s'^^i^i were made using the fourth moments of the multivariate 
normal distribution compatible with the given second-order structure as a guide. 

Whilst considerably more specifications would be required for a full Bayes or Bayes 
linear analysis, (5.17), (5.18), and (5.19) are sufficient for our purposes as these are 
the only specifications needed for the matrix object approach to belief revision which 
is to be taken in the later sections. 

There is a lot of symmetry in these values, greatly simplifying the specification, 
but once again, any non-negative covariance structure over the quadratic products is 
acceptable. Here assumptions of exchangeability over the variances and the covari- 
ances have been used. Many of the specifications made for the quadratic structure 
will be "averaged over" in the matrix object approach to covariance adjustment which 
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shall be developed, and so there is a limit to the effort that one would wish to put 
into very detailed specifications at this stage, since the suggested analysis will not be 
overly sensitive to the individual specifications. 

5.3.3 Observable quadratic terms 

First, certain linear combinations of the observables which do not involve the state 
vector, ©4, will be constructed. This is useful for various reasons, and in particular 
because it greatly reduces the prior specification required for the analysis of the 
quadratic structure. In this thesis, we shall mainly be concerned with DLMs for 
which there exists an r x r matrix H, such that HF^ = F^G. We call such DLMs 
two-step invertible. Note that a DLM will be two-step invertible if F is of full rank 
and r > p (as will often be the case for high-dimensional time series), and there will 
often be many matrices H satisfying HF^ = F^G. For example, H = F^G^F^^ 
(where represents any generalised inverse of F^) is a solution. Further, if F is 
of full rank, r < p and such a matrix exists, then H = F^GF{F^F)~^, and so H 
exists, if and only ii F'^GF{F'^F)-'^F'^ = F'^G. Note also that the matrix H'^ has the 
property that H'^F^ = F^G^. For a two-step invertible DLM, the following vectors 
of observables which do not involve the state vector may be constructed: 

X; = Xt- HXt I = F^ivt + vt- Hut-i yt>2 (5.20) 
X'l = Xt-H''Xt_2 = F^iVt + F'^GiVt-i + i^t-H''ut_2 Vt > 3 (5.21) 

Form the matrices of quadratic products, XjXj^ and X"X"^ Vt. These are pre- 
dictive for and V. 

Not all DLMs are two-step invertible, but for the constant dynamic linear model 
outlined in Section 5.2.1, it is always possible to construct linear combinations of the 



CHAPTER 5. DYNAMIC LINEAR MODELS 



72 



observations which do not involve the state, provided only that the constant dynamic 
linear model in question is observable (for a discussion of the very weak restriction 
of observability, see West and Harrison (1989, Chapter 5)). However, in general such 
linear combinations require more than two successive observations from the series. 
Consequently, for simplicity attention is restricted to the two-step invertible model. 
However, the approach is quite general, and may be applied similarly for any constant 
observable DLM, the only difference being that the covariance specification is more 
complicated, more quantities are involved in the adjustment, and the identifiability 
results are more complex, and in general, slightly weaker. 

5.3.4 Example 

For the example, F, G and H are all the identity, and so the one and two-step 
differences of the observables are formed: 

Xl'^ = Xt-Xt-i=u}t^Ut-Vt-i Vt>2 (5.22) 
Xf^ = Xt-Xt-2 = <^t^<^t-i^i^t-i^t-2 Vt>3 (5.23) 

The quadratic products of these, x'^^^ X^^^^ and xf' xf'''^ are then formed. These 
observables are predictive for and and so may be used to learn about the 
underlying covariance structure. All means and covariances that are required for the 
subsequent analysis are determined by specifications in (5.8), (5.9), (5.10), (5.11), 
(5.17), (5.18) and (5.19). The precise form of the covariance structure over the ob- 
servables is rather complex, and given below. Recall that the operators ® and ★ 
were defined in Definitions 1 and 2 respectively (page 28). The covariance structure 
over the quadratic products of the 1-step differences is determined by the following 
relations: 

Cov(vecF'^, vec(xf^xf^'^)) = Var(vecF'^) (5.24) 
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CoyivecV ,vec{x[^^ X[^^^)) = 2Var(vecF'') (5.25) 

CoY{^rec{x[^^x[^^^),^rec{x[^^x[^^^)) = Var(vecF") + 4Var(vecF'') 

+ Varjvec^n + Var(vec5i^_i) 
+ Var(vec5i") 

+ 2[E{V'' ®V'')+E{V''*V'')] 

+ 4[E(F'0 (g)E(F'^) +E(F'')*E(F'^) 

+ E(F") (g)E(F'^) + E(F'^)*E(F'')] 

(5.26) 

Cov(vec(xj^^xf^T),vec(xSi\x5i\T)) = 4(Var(vecF'^) + Var(vecF")) + Var(vec5i^_i) 

(5.27) 

Cov(vec(xf ^^), vec(xji^,xji^7)) = 4Var(vecF'') + Var(vecF'^) Vi, Vs > 2 

(5.28) 

The covariance structure over the quadratic products of the 2-step differences are 
given below. 

Cov(vecF'^,vec(xf^Jcf^'^)) = 2Var(vecF'^) (5.29) 

Cov(vecF'',vec(xS^^xS^^'^)) = 2Var(vecF'') (5.30) 

Cov(vec(xf^xf^T)^^gp^j5^(2)j5^(2)T-)) ^ 4Var(vecF") + 4Var(vecF'^) 

+ Var(vec5f) + Var(vec5i^_2) 

+ Var(vec5t^) + Var(vec5f_i) 

+ 2[E(y'^(g)y'^) +E(y'^^F'^) 

+ E(F^ ^F'-') +E(F^*F^)] 

+ 4[E(F'^) (g)E(F'^) +E(F'^)*E(F'^) 

+ E(F^) (g>E(F'0 + E(F^')*E(F'')] 

(5.31) 

Cov(vec(xS^^xf^^),vec(xi^\xJ^\^)) = 4[Var(vecF'') + Var(vecF'^')] + Var(vec5j^_i) 

(5.32) 

Cov(vec(xf^xf^T),vec(xS^^2^5^\T)) = 4[Var(vecF^) + Var(vecF'^)] + Var(vec5i^_2) 

(5.33) 

Cov(vec(xf ^ ^T), vec(xJ^^,xj^V^)) = 4Var(vecF'') + Var(vecF'^) Vt, Vs > 3 

(5.34) 
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The covariances between the one and two step differences are determined as follows: 



Cov(vec(xJ^^xj^^'^), vec(xg,xJ5/)) = 4Var(vecF'') + 2Var(vecF'^) Vi, Vs > 3 

(5.35) 

Cov(vec(xj^^xj^^T), vec(x[j2^i^^/)) = 4Var(vecF'') + 2Var(vecF'") + Var(vec5t_2) 

(5.36) 

CoY{wec{x[^^x[^^^),^rec{xf^_^^x[%'^)) = 2Var(vecF'") + 4Var(vecF'^) 

+ Var(vec5i^2) + Var(vec54"_i) 

+ E(F'0 «>E(F'^) +E(F'')*E(F'^) 

(5.37) 

Cov(vec(x5^^xS^^T)^^g(.(j^{2)j5^(2)T)) ^ 2Var(vecF'^) + 4Var(vecF^) 

+ Var(vec5^) + YaT{vecS^) 

+ E{V") (g) E{V^') + E{V") ^ E{V^') 

+ E{V^) ^EiV) +E{V^)*E{V'') 

(5.38) 

Cov{vec{x[^^x[^^^),vec{x[%X^^\^)) = 4Var(vecF'') + 2Var(vecF'^) + \ai{vecS^_^) 

(5.39) 

Cov{vec{x[^^x[^^^),vec{x'^\x'^]j)) = 4Var(vecF^) + 2Var(vecF'^) Vi, Vs > 2 

(5.40) 

These results are obtained by focussing on a general element of a matrix on the 
left hand side, and then substituting into the left hand sides the definition (5.22) 
and (5.23), expanding the covariances, substituting representations (5.12) and (5.16), 
and then simplifying the result using known orthogonalities to deduce the general 
element of the matrices on the right hand side. However, there are several hundred 
terms in some of the expansions and a computer algebra package was used to ensure 
the accuracy of the results. Appendix B deals with the derivation of these results, 
using the REDUCE computer algebra system. 
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5.4.1 n-step exchangeable collections 

Recall that in Section 3.4 the concept of n-step exchangeability was introduced. The 
covariance structure over X[, X" and their quadratic products is second- order n-step 
exchangeable. In Section 5.5.1 an inner-product space of random matrices appropriate 
for dynamic linear models will be constructed, and collections of matrices within this 
space will be found which exhibit generalised second-order n-step exchangeability. 

More formally, form collections X* = {X^^XjjVi, j, Vt > 2} and X** = {X^JXj'jV 
i,j,yt > 3}, of the matrices {X[X't^\yt > 2} and {X'^X'^^l^t > 3} defined in Section 
5.3.3, where X'^^ denotes the i*'* component of the vector X[. Form a vector space, 
V consisting of all linear combinations of the elements of X* and X** and the unit 
constant, and define the inner-product on this space as {X, Y) = P{XY), MX, Y e 
V. We may easily check that X* is (second order) 2-step exchangeable over and 
that X** is 3-step exchangeable over t. 

5.4.2 Identification of the covariance structure underlying 
the DLM 

The n-step exchangeability representation theorem (Theorem 1) allows construction 
of models for the observable quadratic products which have been formed. The ele- 
ments of the collection {X[X'^\it > 2} for the two-step invertible DLM, are 2-step 
exchangeable over t. Using Theorem 1, construct the representation (3.25). The 
identified quantities may be constructed as the Cauchy limit of the arithmetic means 
of the elements. 
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Lemma 1 The 2-step exchangeable collection {X^X^^|Vt > 2} identify the matrix 

M' = F^V^F + + HVH^ (5.41) 

and the 3-step exchangeable collection {X"X""'"|Vt > 3} identify 

M" = F^GVG^F + F^V^F ^ HWH^^ (5.42) 

Proof 



1 ^ 

M' = lim — yX'X'^ (5.43) 

1 ^ 

= yi^^j^Y.iF^^t + i^t- Hut.i){F^u:t + Ut- Hut.^f (5.44) 



1 ^ 

I Z^T, . T 77T, . T rjT , , . T 77 



1 ^ 



-UtVt.i^H^ - Hi^t.iiVt^F - Hut-iut^) (5.45) 

J™ ^ E F'^^ti^t^F + i/ti/t^ + Hut ,^H^ (5.46) 

= F'^y'^F + F'' + i/y'^i?'^ (5.47) 

The derivation of M" is similar. □ 

Now since | [i/M'i/'^ - (M" - M')] = HVH^ we deduce that for a 2-step in- 
vertible series, the collection 

{ I [hx[x[^h^ - {x'lxr - x;x;t)] | w > 3} (5.48) 



identifies HVH'^. 
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Now \i r > p, H can always be chosen to be invertible. If r < since it has been 
assumed that F and G are of full rank, so is H, and so H is invertible. Consequently, 
it shall be assumed that H has been chosen to be invertible. Now, since \[M' — 
H-^{M" - M')H-^^] = and M' - V - HVH^ = F^V^F it is trivial to deduce 

Theorem 2 For a 2-step invertible series with invertible H , put 

M, = \ \X\X'^ - H-\X'lXf - X[X[^)H-'^] (5.49) 
(i) The collection of matrices {Mj|Vt > 3} identify . 

(a) The collection of matrices {x[X'^ - Mt - HMtH^ Vi > s} identify F'^V'^F. 

□ 

Up > r, the entire matrix is not identified. This would require a fuller selection 
of observables than we have considered here. The identified matrix, F^V^F, is the 
transformation of with the most immediate effect on the adjustment, since it is 
the contribution to the uncertainty for X^ from ©t, given @t-i- To see this, note 
that 

Xt = F'^GQt.i + F^u:t + i^t (5.50) 



and so 



Var(Xt) = F'^'GVar(0i_i)G'^'F + F^P{V'^)F + P(y^) (5.51) 



givmg 



Also, 



ymiXt\@t-i) = P(F + V) (5.52) 



Var(Xt|0i) = P{V) (5.53) 
and so the additional uncertainty comes from the F^V^F. 
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5.4.3 Example 

In the example, Theorem 2 implies that the collection {X^^^X^^^^ — xj^^xj^^^|Vt > 
3} identify V^' and that the collection {X^^^xi^^'^ - ^X^X^^^'^lVt > 3} identify V. 
It is worth noting that in the simple example being considered here, we have that 

Xf) = XS^) + xSi\ (5.54) 

and so adding the collection {X^^X^^^\yt > 3} to the space spanned by the 
{X^^^X^^^^|Vt > 2} has the effect of introducing terms of the form X^^^x|i\ + 
X^i\x^^\ It is also worth noting that the collection of quantities 

{-l(xr)xSr + Xaxr^-)|vt>3} (5.55) 

identifies the matrix V, and that in this example, such a collection has smaller 
variance than the collection {xS^^xf^^ - ^x[^^ x[^^^\yt > 3}, and so represents a 
better collection with which to identify V (since uncertainty about the underlying 
quantities is resolved more quickly). 

By observing sales at an increasing (but finite) number of time points, one may 
resolve through linear fitting, as much uncertainty as is desired about the underlying 
covariance structure for the particular time series model being dealt with. 

If all fourth order prior belief specifications have been made, a simple Bayes linear 
analysis can be carried out in order to learn about the underlying covariance structure 
by adjusting the elements of V, by the elements of the observable matrices 
x[^^ x[^^^ , x[^^ X^^^ . However, for long, high-dimensional time series, the number 
of quantities involved in a full linear adjustment is extremely large, and so it is 
important to reduce the dimensionality of the problem, and preserve the inherent 
matrix structure. This is done by carrying out adjustments in a matrix space. 
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5.5.1 Formation of the matrix space 

To learn about r x r dimensional covariance matrices, first form the r x r constant 
matrix basis, by defining Cr{i-i)-\-j to be the matrix with a 1 in the {i, jY^ position, and 
zeros elsewhere, where i and j range from 1 to r. Call this collection C = [Ci, . . . , Cj.^]. 
Define the collections of matrices 

Xl = {X'^X'Z, HX',X',^H^, H-'X',X',^H-'^} (5.56) 

Xj = {X[X[\X'lX'l\HX[X[^H\H-'X[X[^H-^\H-^X'lXfH-'^], Vt > 3 

(5.57) 

Following the construction given in Chapter 3, form the real vector space, J\f whose 
elements are linear combinations of random r x r matrices as follows. 



M = span[CuXl\jXlu ...] (5.58) 
Define an inner-product on M via 



(A,5) = P(Tr[AB^]), \/A,BeAf (5.59) 

Complete A/" into a Hilbert space, Ai. Now since the collections whose mean limit 
points are HV^H^, V^, and F'^V'^F are present in the space, J\f, Cauchy limit points 
such as HV^H^, V, and F'^V'^F are present in the completed space, A4. The inner- 
product on this space is determined by our beliefs about the quadratic products, since 

{A, B) = j2jz [Cov(A,-,, Bjk) + P(A,,)P(5,fc)] VA, B e M (5.60) 

j=i k=i 
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Bayes linear adjustment may be carried out in this space by orthogonal projection of 
the matrices of interest into subspaces of observable matrices. As previously noted, 
this matrix approach to belief adjustment is a more direct way of getting at desirable 
linearity properties of conditional expectations for matrices, than via the somewhat 
artificial constructs such as the matrix Normal, inverse Wishart and matrix T distri- 
butions. The definitions and properties of such distributions are described in Dawid 
(1981), and their application to matrix Normal DLMs is discussed in West and Har- 
rison (1989, Section 15.4). Essentially, the notation and distributions are chosen 
so that they are consistent under marginalisation (Dawid 1981, Section 2), leading 
to simple linear conditional and predictive distributions for matrix Normal models 
(Dawid 1981, Section 8). Consequently, the updating equations for a matrix Normal 
DLM retain a simple linear form (West and Harrison 1989, Section 15.4.4). 

5.5.2 Example 

For our example, simply construct 

M = span{C, X«X«^, X«xi^)^, . . . , X^^^ xf^\ • • •} (5.61) 

and impose the inner-product (5.59), inducing the Hilbert space M., which contains 
limit points such as V and . Note that in order to evaluate (5.59), the specifi- 
cations needed are precisely those which were made in Section 5.3.2. The fact that 
many other aspects of the fourth order specifications are not necessary is very helpful, 
as this greatly reduces the specification burden. Often it is most straightforward to 
make direct primitive specifications for the matrix object inner-product. However, 
for simplicity here, the specifications for the matrix inner product have been built 
up from specifications over the scalar quadratic products, thus establishing the links 
between the scalar and matrix analysis, as discussed in Chapter 3. 
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5.5.3 n-step exchangeable matrix objects 

The definition of generalised n-step exchangeability applies directly to matrix ob- 
jects in the space A^. The collection of matrix objects {X[X^^\yt > 2} is 2-step 
exchangeable in the space Ai, and the collection {X"X""'"|Vt > 3} is 3-step exchange- 
able. This leads to a restatement of Theorem 2 for matrices in the space A4. The 
limit points are the matrices of limit points of their elements, due to the consistency 
of the inner-products on the scalar and matrix spaces, as shown in Section 3.3. 

Theorem 3 Put Mt = \ \x\X'^ - H'^iX'^X';^ - X[X['^)H-^^] . 

(i) The collection {M^l > 3} identifies V in M. 

(ii) The collection {x[X[^ -Mt- HMtH^\yt > 3} identifies F^V'F in M. 

□ 



5.5.4 Adjustment 

Consider observing n > 3 time points in the series. Form the matrix space, A1, and 
the observable subspace D„ C Ai 

Dn = span{C U X| U X| U . . . U X| } (5.62) 

Then the adjusted expectation map, ■ M. Dn, is the orthogonal projection 

into the Dn space. In particular, evaluate En^CV^) and Er,„{F^V^F), which are 
matrices in the Dn space (y and F^V^F are chosen because they are the matrices 
in Ai which we are most interested in). 
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5.6 Bayes linear adjustment for the example 
5.6.1 The adjusted covariance matrices 

Adjustments were carried out using 17 time points from the actual time series. The 
sample means for the 6 brands were 16.5, 3, 4.5, 27.5, 3.4 and 31. The matrix objects 
and V were adjusted in the following ways: 
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The prior specifications were given in (5.10) and (5.11). The adjusted matrices are 
perturbations of the prior expectations for the matrices. Notice that the variance 
associated with the fourth variable has been inflated considerably in both matrices. 
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Variable (Brand) 


Primary eigenvector 


Secondary eigenvector 


1 


-0.07 


0.21 


2 


-0.03 


-0.21 


3 


-0.02 


-0.12 


4 


0.39 


0.05 


5 


-0.07 


-0.13 


6 


-0.09 


0.31 


Eigenvalue ratio 


1.88 


1.47 



Table 5.1: Eigenstructure of the belief transform for the adjustment 



Variable (Brand) 


Primary eigenvector 


Secondary eigenvector 


1 


0.02 


-0.11 


2 


0.04 


0.02 


3 


0.03 


0.02 


4 


0.12 


0.03 


5 


0.02 


0.00 


6 


0.07 


-0.08 


Eigenvalue ratio 


1.89 


1.24 



Table 5.2: Eigenstructure of the belief transform for the adjustment 

The sample variances for the 17 cases of the six brands considered were 167, 22, 
37, 560, 18 and 427. Informally, it seems that there may indeed be more variability 
associated with the fourth (and last) variable. 

More formally, as described in Section 4.3.2, one may analyse the eigenstructure 
of the belief transform implied by the adjustment. Examining Table 5.1 shows that 
for the adjustment of the matrix V^, variance has been inflated by a factor of 1.88 
in a direction close to the fourth brand, and by a factor of 1.47 in a direction close 
to the difference between the second and the sum of the first and last brands. Other 
components had eigenvalues close to one, and hence were of minimal interest. Table 
5.2 shows that for the adjustment of the matrix V, variance has been inflated by a 
factor of 1.89 in a direction close to the fourth brand, and by a factor of 1.24 in a 
direction close to the difference between the first and last brands. 
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5.6.2 First order adjustment 

Since the aim is to predict sales more accurately, a sensible test of the procedure is to 
compare the performance of the first order model, (5.6), (5.7), using both the prior 
and adjusted covariance matrices. Carrying out the adjustment shows that the Bayes 
linear diagnostic warnings (the size and bearings of the adjustments, as described 
in Goldstein (1988b)) are noticeably closer to their expected values when using the 
adjusted matrices. For the given example, most of the size ratios for adjustments of 
the first order structure were noticeably closer to one using the adjusted covariance 
structure to predict future values, suggesting that the adjusted matrices match more 
closely with the forecast performance of the model. 

The improvements in forecast performance are graphically illustrated using di- 
agnostic Bayes linear infiuence diagrams. Two sequences of diagrams are given in 
Figures 5.1, 5.2, 5.3, 5.4 and 5.5. The top diagrams represent the usual Bayes linear 
adjustment for a dynamic linear model using the a priori covariance structure, and 
the bottom diagrams represent the adjustment using the updated covariance struc- 
ture. The shadings in the centre of the nodes are diagnostics based on the sizes of 
the adjustment. The larger the amount of red or blue, the stronger the diagnostic 
warning. Within each diagram, the upper layer of nodes represent the unobservable 
quantities in the models; namely Uf and ojf The lower row of nodes represent the 
observable quantities; namely the Xt- It can be seen that the lower series of dia- 
grams has consistently, slightly less diagnostic warning, indicating that the revised 
covariance structure matches more closely with the forecast performance of the model. 

Of course, since the revision of the covariance matrices was carried out using the 
first 17 weeks worth of data, one would hope that the diagnostics would improve 
when the first order adjustments are carried out for those weeks, using that revised 
covariance structure. A purely sample based estimate of the covariance structure 
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would probably perform even better! However, it is at least reassuring to see that 
the adjustment clearly hasn't made things worse. Further, if we focus attention on 
Figure 5.5, we see that the improved matching of the forecast performance of the 
model continues as first order adjustments continue on new data. Note that the 
diagnostics have not significantly improved using only 17 weeks of data. For these 
kinds of time series structures, it takes a large amount of data to learn a significant 
amount about the underlying variance structures, and this point is explored further 
in the next section. Diagnostics for scalar and matrix adjustments are discussed more 
fully in Chapter 6, and the calculations underlying the diagnostics for this example 
are discussed in Section 6.1.2. 

5.7 Iterative adjustments 
5.7.1 Methodological considerations 

For real-time problems, data will arrive for consideration one time point at a time. 
Suppose we are currently at time t. We must consider how best to use the available 
data in order to make predictions for future sales. We will wish to use all of the 
data to revise beliefs about the covariance structure for our dynamic linear model, 
and then carry out first order adjustments using the revised covariance structure. 
However, when we receive the data for time point t + 1, we will update beliefs for 
the covariance structure. Having done so, we will need to re-compute the first order 
adjustments for all time points, using the revised covariance structure. It will not be 
sufficient to simply add data for the last time point to the adjustment. The quadratic 
data is informative for the covariance structure underlying the entire first order series, 
and not just for the last time point. 
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Figure 5.1: First order adjustments; weeks 1-5 
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Figure 5.2: First order adjustments; weeks 5-9 
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Figure 5.3: First order adjustments; weeks 9-13 
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Figure 5.4: First order adjustments; weeks 13-17 
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Figure 5.5: First order adjustments; weeks 17-21 
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5.7.2 Illustrative example 

Consider Figure 5.6. Each small node represents 8 weeks of sales data for the top 
two brands of shampoo. Each node on the top row represents 16 matrix objects 
— one for each week for both X[X[^ and X'lX'P- . The large node in the middle 
represents the two matrix objects and V" . The bottom rows of small nodes each 
represent 8 two-dimensional vectors of weekly sales. The shadings for the bottom row 
of nodes correspond to those for a conventional stepwise Bayes linear adjustment of 
the structure. The shadings for the row above take the covariance structure updates 
into consideration. The shadings in the centre of the nodes are diagnostic warnings, 
based on the size and hearing of the adjustments. It is immediately obvious that the 
shadings for the adjustments which take into account available quadratic data improve 
proportionately with time, indicating an improvement in the understanding of the 
forecast performance of the model as the covariance structure tends to the "true" 
underlying structure, as the model learns about the variability of components and 
correlations across them, thus improving forecast estimates, and associated standard 
errors. Note though, that the rate of learning in this example is very slow. Each 
portion of shading on the large central node represents the proportion of uncertainty 
resolved by eight weeks of sales data. Even at the end of the process, having used 40 
weeks worth of quadratic sales data, only about one sixth of the uncertainty about 
the underlying covariance structure has been resolved. It is no wonder that in the 
last section, improvements in diagnostics using only 17 time points was marginal. 
Diagnostics for Bayes linear adjustments are discussed in Chapter 6. 
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Figure 5.6: Iterative first and second order adjustments 
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Good forecasting requires careful updating of the covariances within the time series 
structure. Informally, the degree of shrinkage between the prior and the data is up- 
dated, and relationships between variables are properly taken into account. Since one 
is able to adjust the covariance matrices for both the observational as well as state 
residuals, it is possible to understand the competition and demand effects taking 
place within the series. By taking a matrix object approach, the problem is greatly 
simplified by reducing it's dimensionality. This is important for both simplifying be- 
lief specification and belief adjustment, and also for interpretation of the structure 
of the adjustment and accompanying diagnostics. There are also the general advan- 
tages of the Bayes linear approach; namely of allowing complete flexibility for the 
prior specifications, without placing distributional restrictions on the data or model 
components. 

Of course, it is in principle possible to take a distributional Bayesian approach to 
this problem. Provided one is up to making a joint distributional statement about the 
components of the underlying matrices and the quadratic observables (a formidable 
task), one could use MCMC techniques, for example, to sample from the distribution 
to obtain posterior estimates. However, for non-trivial problems the problem would be 
sufficiently high-dimensional that assessing convergence would be extremely difficult. 
Also, it would be hard to assess the effect of the largely arbitrary choice of distribution 
made. Further, an a priori analysis to tackle design issues (such as how many time 
points are needed to reduce uncertainty about the underlying covariance matrices 
to one tenth of their a priori values) would be difficult to contemplate using such 
a framework. On the contrary, using the Bayes linear approach advocated in this 
chapter, a full a priori analysis is possible, allowing the handling of such design 
questions before receiving any data. 



Chapter 6 



Matrix adjustment diagnostics 



6.1 



Bayes linear diagnostics 



6.1.1 



The size and bearing of a scalar adjustment 



Before going on to discuss general Bayes linear diagnostics, it is important to outline 
the usual concept of size and bearing for scalar Bayes linear adjustments. These 
concepts are discussed more fully in Goldstein (1988b). For the adjustment of a 
collection of scalar random quantities, B = [Bi, B2, . . .], by a data space, D, we a 
priori form the random vector Eu{B) = {Eu{Bi),Eu{B2), . . . When the data 
space is observed to be D = d, the precise value of the vector Eu{B) = Ed{B) 
becomes known. The further away E^(fi) is from E{B) relative to prior standard 
deviation, the more surprising our change of belief. With this in mind, the size, 
S\zed{B) of the adjustment is defined as follows. 



Sized(-B) 



max 

Xespan{B} 



Var(X) 



(6.1) 
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Now let U = [Uq = 1, Ui, U2, ■ ■ ■] be an orthonormal basis for span{B} with respect 
to the inner-product (•,•). Then the bearing, Zd{B) is defined by 

Za{B) = J2EamUi (6.2) 

The bearing has the property that it is the quantity, X which maximises the expres- 
sion in (6.1). Hence, 

Sizerf(S) = Var(Z,(S)) = J2 ^liUi) (6.3) 

2=1 

Also note that the bearing is a complete summary of the adjustment, since 

E,(X) - E{X) = Cov(X, ZdiB)) (6.4) 

The size ratio, Srd{B) of an adjustment, is the magnitude of the size of the adjust- 
ment, relative to it's expected value. Hence, 

Size^(E) 

and E{Yar{Zr,{B))) is given by the trace of the belief transform for the adjustment, 
Cov{D,B)Yaj:{B)-^Cov{B,D)Yaj:{D)-^ (Goldstein 1988b). A size ratio close to 
one indicates observations consistent with prior beliefs. A very large size ratio is due 
to a surprisingly large change in belief, indicative of too small specifications made 
for prior uncertainties. A size ratio very close to zero is due to a surprisingly small 
change in belief, indicative of too large specifications made for prior uncertainties. 
It is a simple transformation of the size ratios which is used to shade the diagnostic 
warnings in the centres of nodes on diagnostic Bayes linear influence diagrams. 

To those readers familiar with the basic concepts of functional analysis, it should 
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now be clear that the bearing is derived from the Riesz representation for the bounded 
linear functional Ed{X) — E{X) (see Section 3.8-1 of Kreyszig (1978), for example). 

Note then, that the fact that the linear function Ed{X) — E{X) is in fact a func- 
tional (a function whose image is a subset of the real or complex numbers) is crucial 
to the concept of the bearing, and all of the elegant additive properties of the bear- 
ing for sequences of adjustments (Goldstein 1988b) depend upon the existence of a 
Riesz representation. Note also that when we are dealing with matrices, the function 
Ed{X) — E{X) is most certainly not a functional, and so in order to make any sense 
of the standard diagnostic and interpretive features of the Bayes linear methodology, 
a clarification of precisely what is meant, in general, by terms such as the size and 
bearing of the adjustment, is required. 

6.1.2 Examples 

Consider Figures 5.1, 5.2, 5.3, 5.4 and 5.5. The shadings in the centre of the nodes 
are a non-linear transformation of the size ratios for the depicted adjustments. Red 
shadings represent size ratios lager than one, and blue shadings represent size ratios 
smaller than one. The transformation has been chosen so that a shading of more 
than half of the available area is "surprising" . It is clear that the lower diagrams (the 
first order adjustments using the revised covariance structure) have slightly smaller 
shadings, representing size ratios closer to one, representing adjustment sizes closer 
to expected values. This is indicative of a covariance structure which more closely 
matches the forecast performance of the model. 

Figure 5.6 shows the improvement over time of the forecasting performance of the 
model as more covariance information becomes known. The diagnostic shadings for 
the row of first order adjustments including covariance updating become proportion- 
ately smaller than those without covariance adjustment as more information about 



CHAPTER 6. MATRIX ADJUSTMENT DIAGNOSTICS 



97 



the covariance structure becomes known. The diagnostics for the matrix adjustments, 
shown in the large central node, are calculated using the subspace bearing technique 
discussed in the next section. 

6.2 The subspace bearing 
6.2.1 Definition 

Here the bearing is generalised to the space of random matrices. For any given con- 
stant matrix, G, and projection space D, the bearing is defined to be the (essentially) 
unique random matrix, B, with the properties E{B) = and 

{A-E{A),B) = {E,{A),G) - {E{A),G), ^AeM (6.6) 

where Ed{A) represents the realisation oiEu{A) after observing D = d. This matrix is 
derived from the Riesz representation for the functional {Ed{A) — E(A), G). Different 
choices of the constant matrix, G, give information about different projections of the 
adjusted expectations. 

The choice of G which causes diagnostics to match up exactly with those for 
scalar Bayes linear adjustment in the case of exclusively one-component matrices, is 
the choice given by the constant matrix whose elements are all 1. To see this note 
that the for this choice of G, the functional of interest, {Ed{A) — E{A), G), becomes 

E^MAj) - E{A,j) (6.7) 

i j 

and so for a one-component matrix, whose only non-zero element is in row i, column 
j, this becomes Ed{Aij) — E{Aij), as for the usual case of scalar adjustments. The full 
correspondence follows since the one-dimensional matrix subspaces are orthogonal 
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under the matrix inner-product. 

6.2.2 Example 

Reconsider Figure 4.2 from Chapter 4. At the centre of the V node, red (blue) 
shadings represent changes in expectation larger (smaller) than expected a priori. 
Adjusting by the sample covariance matrix, Dg, caused a much larger change in 
expectation than expected a priori. This is evidence of over-confidence about our 
ability to predict the true value of the covariance matrix, and suggests re-examination 
the prior specification. Notice also that adding the full variance collection. Dp, to 
the adjustment had the potential to change our expectation considerably, but in fact, 
hardly changed it at all. This is perhaps evidence of overestimation of the importance 
of the covariance terms. 

6.2.3 A space of subspace bearings 

Note that the subspace bearing is defined for a given choice of constant matrix, G. 
Consequently, for each choice of constant matrix, there is a corresponding bearing. 
In particular, it may be of interest to study the structure of the map, (/) : C — > A4, 
which maps a given choice of constant matrix to it's corresponding bearing, and 
its image, (f){C) C J\A. Notice that the map (j) is linear, and so an analysis of the 
eigenstructure of the map 

(/)*(/): C ^ C (6.8) 

(where (j)* denotes the Hilbert-adjoint of the (j) operator) may be informative. See 
Section 6.3 for more details of this kind of construction and analysis. 
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6.2.4 Limitations of the subspace bearing 

The above sections show that the subspace bearing can be very useful for diagnosing 
matrix adjustments, and in particular, for linking up with diagnostics for the com- 
ponent scalar adjustments. However, in general there is only a sensible link with 
scalar diagnostic analyses in the case where all matrices are decomposed to the one- 
component level. When matrices are not decomposed to the one-component level, the 
interpretation of the subspace bearing is much less clear. However, one may not be 
prepared to undertake the specification and computational burden imposed by such 
a decomposition of structures. 

In the simple exchangeable matrices example discussed in the previous section, 
all specifications over the matrix components were made in order to deduce an inner- 
product over all sub-matrices, and so no problem arose in deducing or interpreting 
the bearing. However, in the dynamic linear model example of Chapter 5, only 
specifications sufficient to allow the deduction of the inner-product over the whole 
matrix objects were given. Hence in this case, the interpretation of the calculated 
subspace bearings are unclear. In the matrix adjustment example shown in Figure 
5.6, large diagnostic warnings are present, but it is totally unclear as to whether or 
not these should really be considered to be a serious problem. 

Moreover, in general one will often wish to make primitive specifications for the 
matrix inner-product, without making any recourse to scalar components, and in this 
case it is clear that obtaining beliefs over the scalar subspaces is not possible. Also, 
the methodology underlying the subspace bearing seems contrived, and the resulting 
definitions of sizes and bearings of adjustments do not seem to fit well with an intuitive 
concept of size and bearing for an adjustment. 

The above motivates the development of a more general concept of bearing, not 
tied the the concept of the Riesz representation for bounded linear functionals on 
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Hilbert space. 

6.3 General Bayes linear diagnostics 
6.3.1 Observation operators 

Definition 4 Consider the Hilbert space H with Hilbert subspaces, C and D such 
that C C D C 'H. C represents the subspace of known quantities, and D represents 
the subspace of observable quantities. Define the operator, Ex)(-) : H ^ D to be the 
orthogonal projection into the space D. Then, for a given realisation d of the space, 
D, define the observation operator, as follows: 



where for every X ^ D, Fd{X) is the realisation of X . Note that the observation 
operator, is linear. Also, the C space will, in general, be chosen so that F^ is 
necessarily bounded. Next define the observed adjusted expectation operator, E^, via 



Fd{-) -.D^C 



(6.9) 



(6.10) 



where 



e,{x) = fSd{x), yxen 



(6.11) 



Note that E^ is bounded and linear. 



□ 



CHAPTER 6. MATRIX ADJUSTMENT DIAGNOSTICS 



101 



6.3.2 General diagnostic measures 

The bounded linear operator, : — > C contains all the information about the 
observed adjustment. If we are interested in the effect of the observations on some 
particular Hilbert subspace, B cy,^ such that B ± C* , we can look at the restriction 
of Ed to B, 

E^Ib-.B^C (6.12) 
We may compare this with the restriction of the E^^ projection to B, 

Ed\b-B^D (6.13) 

in order to get an impression of the magnitude of the changes, compared to what we 
expected a priori. The structure of the E^ls operator is revealed by examining the 
eigenstructure of the operator, 

E^I^E^Ib : B ^ B (6.14) 

where E^l^ denotes the Hilbert-adjoint of the operator E^l^. The Hilbert-adjoint 
of an operator is defined in Kreyszig (1978, Section 3.9). This is compared to the 
eigenstructure of the operator, 

EdIbEdIb-.B^B (6.15) 
The belief transform for B adjusted by D, 

Eb\dEd\b : B ^ B (6.16) 

*We are interested in changes of expectation. It is simpler to choose to work with a space B 
such that B ± C than to allow general B, and look at the operator — E. It doesn't make any 
difference to the analysis however, since if B _L C, all elements of B have zero expectation. 
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is defined and discussed in Goldstein (1981). The eigenstructure of this operator 
contains all information about expected changes in belief. It is worth pointing out that 
the operator E^iI^E^jI^ is simply the belief transform Eb\d^d\b, since E^,]^ = E^Id. 
To see this, note that 

Ed\b-D^B (6.17) 

is defined by the property 

{ED\B{b),d) = {b,ED\Ud)), ybeB,deD (6.18) 

and that 

Eb\d-D^B (6.19) 

has the property 

{b,EB\D{d)) = {b,d) = {En\B{b),d), ybeB,deD (6.20) 

Properties of projection operators are discussed in Kreyszig (1978, Section 9.5). 

For the special case of scalar adjustments (dim(C) = 1), it is clear that the opera- 
tor Erf I ^ Erf I B has rank 1, and so has only one non-trivial eigen-pair. The eigenvalue of 
this operator is the size of the adjustment (note that the size of a scalar adjustment 
is defined to be the square of the induced norm of the E^ls operator, that the norm of 
an operator is equal to that of it's adjoint, and that the induced norm of the Edl^E^ls 
operator is equal to it's single eigenvalue), and it's corresponding eigenvector is known 
as the bearing of the adjustment. The properties of the eigenvalues and eigenvectors 
of the Edl^E^ls operator are discussed more fully for the general case later in this 
section. 

For general adjustments, the operator Edl^E^ls may have more than one non- 
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trivial eigen-pair, and so a generalisation of the concepts of size and bearing are 
required. 

One may generalise the concept of size in a number of ways. For example, one 
could define the size of the adjustment to be the maximum eigenvalue of the operator 
EdlfiEdls, which would lead to the (perhaps desirable) property 

Sizerf(5) = sup (Ed(X),Erf(X)) (6.21) 

XeB+C,E{X)=0,{X,X)=l 

sup ||Ed(X)||2 (6.22) 

XeB+C,E{X)=0,{X,X)=l 

However, one of the key properties of the size of the adjustment is the way it's 
expected values naturally sum over sequences of adjustments. If the size was defined 
as described above, such a property would be lost. Explicitly, it is the case that for 
orthogonal subspaces, Di and D2, 



E£)ie£'2lB — EdJb ® EdJb (6.23) 

Consequently, 



^Di®D2\*b^Di®D2\b — (E^ijIb ® E£)2|b)*(E£)Jb ® E^iJb) (6.24) 
= (EbJ^®Eb,|^)(EbJb®Eb,|b) (6.25) 
= ^Di\*b^Di\b + ^D2\*b^D2\b (6.26) 



In particular, 

Tr(E£,j0£,2 \*B^Di®D2\) = Tr(Ei)j \*b^Di \b) + T^i{^D2\*B^D2 \b) (6.27) 



CHAPTER 6. MATRIX ADJUSTMENT DIAGNOSTICS 



104 



and so the expected value of the trace of the ErfH^Erfls operator sums over orthog- 
onal sequential adjustments. To see that Tr(E£)|^E£)|s) is the expected value of 
Tr(Erf||,Erf|B), note that P(Tr(Erf||,Erf|B)) = Tr(EB||,P(F;Frf)Efl|B) =Tr(Efl||,Efl|B). 

Of course, for scalar adjustments, Tr(E^|^E^|B) is just the single eigenvalue of 
EdlfiEdls, and so it corresponds precisely with the usual definition of size. In general 
however, Tr(Ed|^Ed|B) is the sum of the eigenvalues of Edl^Edl^- This is useful for 
exactly the same reason that it's expected value, Tr(E£)||jE£)|s) (the trace of the 
belief transform) is useful for summarising a priori analysis of belief structures. Of 
course, just as it is of great value to examine in detail the full eigenstructure of the 
belief transform, E^jl^E^ils, it is always desirable to examine the full eigenstructure 
of Erf I Is in order to fully understand the observed changes in belief. To conclude, 
the following definitions are made. 

Definition 5 The size of the adjustment of B given d, Sized{B), is given by 

Sizerf(S) =Tr(Erf|^Erf|B) (6.28) 
This may be compared with it's a priori expected value, Sizefl(i?) given by 

SizeniB) = Ti:{Ed\*bEd\b) (6.29) 

The set of non-trivial eigenvectors of Edl^E^ls are known as the bearings of the 
adjustment. The eigenvalue corresponding to a particular bearing, is known as the 
size of that bearing. 

□ 

The bearings of the adjustment correspond to the elements of B whose observed 
expectations are different to their a priori expectations. The corresponding sizes give 
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an indication of the magnitude of the changes. To see this, let {Zi, Z2, . . . , Z^} be an 
orthonormal set of non-trivial eigenvectors of Edl^Edls with corresponding ordered 
eigenvalues {Ai, A2, . . . , A^}. Now for any A ^ B, write 

A = aiZi + ^2^2 + \- arZr + A (6.30) 

where A _L span{Zi, Z2, . . . , Zj.}. Consequently, 

Ed{A) = Ed{a,Zi + a2Z2 + --- + arZr + A) (6.31) 
= aiErf(Zi) + «2Ed(Z2) + • ■ • + arEdiZr) + Ed{A) (6.32) 
= aiEaiZi) + a2Ea{Z2) + ■ ■ ■ + arEaiZr) (6.33) 

(6.34) 

and so the bearings define the directions within the B space whose observed adjusted 
expectations difi^er from their a priori expected values. Note that when dim{C) = 1, 
there is only one bearing, which is just the usual scalar bearing for the adjustment. 

In practice, for finite dimensional problems, a matrix representation of the oper- 
ator Ed\b with respect to an orthonormal bases on B and D is formed, which may 
then be transposed to give a matrix representation of E£)||j. The two matrices can 
then be multiplied together, and the eigenstructure of the resulting matrix may be 
analysed to give the bearings, and their corresponding sizes. 

Explicitly, let {Bi, B2, . . . , B^} be an orthonormal basis for B, and let {Ci, C2, ■ ■ ■ , 
Cn} be an orthonormal basis for C. Now, for each B^, evaluate Ed{Bi) with respect 
to the basis on C. Then, for each i we have 



Ed{Bi) = aiiCi + a2iC2 H h ftmC, 



(6.35) 
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and so we deduce that with respect to the given bases, the operator £,^1^ is represented 
by the n x m matrix 



V «nl «n2 



(6.36) 



Consequently, the operator E^l^ is represented by the m x n matrix 



«12 «22 



OLnl \ 



OLr 



(6.37) 



The mxm matrix representation for the Ed|^Ed|s operator may then be obtained by 
multiplying together the matrix representations for E^l^ and E^l^. The eigenstructure 
of the resulting matrix may then be analysed in order to understand the structure of 
the EYil^Et^ls operator. 

Note that the theory developed in this section applies to a general Bayes linear 
adjustment where the random objects have a multi-dimensional constant space, and 
not just to the matrix space example developed in this thesis. 

6.3.3 Example 

In order to illustrate the definitions above with a concrete example, the simplest 
possible case will be used. Consider the adjustment of the matrix V by the space Ds 
of Chapter 4. Here, to simplify notation, D will be used to denote the space C + Ds- 
The expression for the adjusted expectation turned out to be 



En{V) = lp{V) + ^S 



(6.38) 
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This is (4.7) with a = 1/3. The adjustment space of interest is the space 



B = span{V - E{V)} 



(6.39) 



First consider the evaluation of Size£)(5). Put 



X 



23.618 



[V - E{V)] 



(6.40) 



This is an orthonormal basis for B. Put 



(6.41) 



and note that this has norm 1, and forms an orthonormal basis for the subspace of 
D being projected onto. Now, since 



Ed\b{V-E{V)) = -[S-E{V)] 



(6.42) 



we trivially deduce that 



Ed\b{X) = 0.573r 



(6.43) 



and so on span{X, Y}, the operator Ed\b has matrix representation (0.573). Conse- 
quently, E£)|^E£)|b has matrix representation (0.328), and so 



Sizec(S) = 0.328 



(6.44) 



Now evaluate Sized{B). 



I 0.3 9.01 9 \ 

9.01 122.04 107.69 
V 9 107.69 158.26 / 



(6.45) 
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51.75 (6.46) 



where s is the observed value of S. So put 



and note that this is an orthonormal basis for the subspace of C being projected onto. 
Then 

EdlsiX) = 0.731Z (6.48) 

and so on span{X,Z}, the operator E^ls has matrix representation (0.731). Conse- 
quently, Edl^Edls has matrix representation (0.534), and so 

Sizerf(S) = 0.534 (6.49) 

which gives a size ratio of 1.63. This tells us that the changes in belief were very 
slightly larger than expected a priori, but should not be considered a serious diag- 
nostic warning. 

6.3.4 Summary 

The work of this section gives a completely unified framework for both the a priori and 
a posteriori analysis of totally general Bayes linear statistical problems. A priori, the 
eigenstructure of the operator E^jI^E^iI^ (the belief transform) is analysed in order to 
understand expected changes in belief. A posteriori, analysis of the eigenstructure of 
the operator Edl^E^ls (the observed belief transform) may then be analysed in order 
to understand the observed changes in belief. The approach advocated is seen to 
generalise the diagnostic methodology used for scalar adjustments, and the generalised 
concept of size remains additive over sequences of orthogonal adjustments. It is also 
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the case that analysis of the eigenstructure of the operator F^Fa provides direct 
information on the discrepancy between prior specification and observations, though 
this is not obvious from the current discussion, and will be discussed further elsewhere. 

6.4 Negative eigenvalues in adjusted matrices 

All the matrices which have been considered in this thesis have been non-negative 
definite (NND). However, in general, a matrix which is revised in an unconstrained 
manner may in certain situations turn out not to be NND. Such adjusted matrices are 
described as incoherent. In general, negative eigenvalues in an adjusted matrix act as 
a diagnostic warning of a possible contradiction between prior belief specifications and 
the data, or indeed of inappropriate choice of model or projection space. However, 
if after careful reflection, it is decided that the prior speciflcations made were proper 
and appropriate, given the available information at the time, there is a "quick fix" 
for the problem which may under certain circumstances, be worth considering. 

Given a matrix which has been revised in an unconstrained manner, and has 
negative eigenvalues, one can construct a sure-loser argument which shows that the 
matrix formed by diagonalising the matrix, setting negative terms to zero, and then 
un-diagonalising (namely, the projection of the matrix into the subspace of coherent 
alternatives), necessarily has smaller loss associated with it than the original adjusted 
matrix, and therefore should be preferred to the original adjusted matrix. 

Consider a random nx n matrix A, with elements a^-. A specification for P{A) is 
made by specifying the matrix X with elements Xij to minimise the loss''' 

n n 

^ = EEK'-^^i)' = P-^IIf (6-50) 
i=i j=i 

t Allowing the more general loss, L = Yl^^iYl^^i -^iji^ij ~ ^ijY complicates the argument 
shghtly, but does not alter the conclusion. 
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Now, suppose that the specification made is incoherent {ie. there is a negative eigen- 
value in the matrix X). We wish to construct a matrix X which is coherent, and 
necessarily has smaller loss associated with it than X. Write the orthogonal decom- 
position of X as 

X = J:X*T,^ (6.51) 

Since the loss is rotationally invariant, we may transform to the orthogonal coordinate 
system implied by the matrix S. Write A* = S^y4S. Then 

L=\\A*-X*\\l (6.52) 

where the non-diagonal elements of A* have zero prevision. All of the non-diagonal 
elements of X* have been chosen to be zero. Since the matrix we are considering is 
incoherent, some of the diagonal elements of X* have been specified to be negative. 
However, since we know that the corresponding element of A* is at least zero, the 
matrix X* which is formed by setting the negative elements of X* to zero, necessarily 
has smaller loss associated with it than X*. Therefore, one would be a sure-looser 
not to prefer X* to X*. Back in our original coordinate system, X = SX*E^ is 
necessarily prefered to X. 

However, zero eigenvalues imply the existence of known combinations of variables, 
which is likely to contradict available data. For this reason, I feel that a posteriori 
correction of the matrix is undesirable, and that the constrained projection spaces 
discussed in the next chapter are a more promising way to go about ensuring that 
adjusted matrices are NND. 



Chapter 7 

Alternative approaches and further 
work 

7.1 Projections into non- negative spaces 
7.1.1 Motivation 

Consider the example of learning about the covariance matrix for a collection of ex- 
changeable random vectors given in Chapter 4. In this simplest case, projection into 
the space spanned by the prior expectation for the matrix and the sample covariance 
matrix was used. Both of these matrices are non-negative definite (NND), and the 
coefficients for the projection are necessarily greater than zero, whatever the belief 
specifications. In this case, the adjusted expectation is necessarily a positive combi- 
nation of NND matrices, and hence NND. This is clearly a desirable state of affairs, 
since an adjusted matrix which is not NND is incoherent. In the other examples, the 
adjusted matrices are not constrained to be NND. 

This issue is not specific to covariance matrix adjustment. For example, just 
consider the fitting of a strictly positive quantity on an unconstrained predictor. The 
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adjusted value for that quantity is not necessarily positive. Usually however, such 
a negative revision would be regarded as evidence of a contradiction between prior 
beliefs and observations or evidence of an inappropriate model, or projection space. 
Some Bayes linear theory needs to be developed for the consideration of non-linearly 
constrained problems. 

For the specific problem of constrained matrix adjustments, a few techniques due 
to special properties and decompositions of matrices are worth considering. 

7.1.2 Eigenspace projections 

Consider once more, the example of Chapter 4. 

RkRj = V + Uk (7.1) 



Also consider a sample covariance matrix, 

1 " 

S = r E(^- - ^){X^ - (7.2) 

predictive for V. Write 

S = S^eE (7.3) 

where E is orthonormal, and 6 = diag{di,d2, ■ ■ ■ ,9r)- Note that a priori, S and 6 
are both random. Now let 6^ be the matrix with 9i in the (z, i)*'* position, and zeros 
elsewhere. Then 

e = 01 + 62 + • • • + (7.4) 

Now define 

Si = E^e.E (7.5) 
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Note that the Si are observable, necessarily NND, and that 



S = Si + S2 + --- + Sr 



(7.6) 



Consequently, instead of projecting into the space 



span{C, S} 



(7.7) 



one may project into the much richer space 



span{C, Si, S2,..., Sr} 



(7.8) 



allowing resolution of a much greater proportion of uncertainty, whilst retaining a 
necessarily NND adjustment (provided that all coefficients are necessarily positive). 

Unfortunately in general, it is not possible to deduce the beliefs over the Si from 
the usual belief specifications made. However, when a better understanding of prim- 
itive specifications for matrices is obtained, this may turn out not be an insurmount- 
able problem. In fact, all that is really needed in order to carry out such an analysis 
is a primitive specification of belief for the eigenstructure of the matrices, rather than 
the elements of the matrices. Note also that certain modelling assumptions, such as 
the assumption of second-order exchangeability, may fix the eigenstructure, giving 
beliefs over the eigenstructure from beliefs over the scalars directly. 

7.1.3 Choelesky projections 

Reconsider the same example of learning about V from a predictive S. Form the 
Choelesky decomposition of S, 



S = AA^ 



(7.9) 
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where A is lower triangular and essentially unique. The Choelesky triangle is a useful 
parameterisation of the covariance matrix, because it has the property that there 
is a natural one to one correspondence between it and symmetric positive definite 
matrices. Unfortunately, the Choelesky triangle of a sum of matrices is not the sum 
of the Choelesky triangles, making it unclear exactly how one should decompose the 
triangle into a sensible projection space. 

7.1.4 Logarithmic transformations 

Another useful re-parameterisation of the covariance matrix is afforded by the matrix 
logarithm. The matrix logarithm is defined to be the inverse of the matrix exponential 
function, which is defined as follows: 

exp{A) = 7 + A + + + . . . (7.10) 

It is easy to see that exp and log are rotationally invariant. Explicitly, if ^4 = E6E^ 
is the eigen-decomposition of A, and 6 = diag{9i}, we have 

log{A) = Sio^(e)E^ = S diag{log{9i)}J:'^ (7.11) 

As for the Choelesky triangle, there is a correspondence between it and the positive 
definite matrices. There is the same problem as with the Choelesky triangles, in 
that the logarithm of a sum is not the sum of the logarithms. However, the sum of 
logarithms is the logarithm of the product, and hence if some sort of multiplicative 
model was felt appropriate, there may indeed be a sensible way to form a projection 
space of logarithmically transformed matrices. Of course, this will be informative for 
the logarithm of the covariance matrix, which is not something necessarily of direct 
interest. 
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Leonard and Hsu (1992) focus their attention on the logarithm of the covariance 
matrix for exactly these reasons. However, their assumption of joint multivariate 
normality for the distribution of the elements of the logarithm of the covariance 
matrix seems a little speculative. Of course, since they make inferences about the full 
joint distribution of the elements of the logarithm of the covariance matrix, they have 
no problem translating those inferences into statements about the untransformed 
matrix. However, the usual computational problems associated with the sampling 
methodology that they use make the approach impractical for large problems. 

More generally, some Bayes linear theory for multiplicative models and logarithmic 
transforms for scalars may prove an easier problem to tackle in the first instance, and 
may shed some light on the matrix version. 

7.1.5 Summary 

In conclusion, I think that some progress on the problem of ensuring NND belief 
revisions is possible, and will be of great value. However, I feel that it would be 
more appropriate to examine related problems from a scalar perspective in the first 
instance. 

7.2 Restricted estimates 

A related problem to that of restricting estimates to be NND, is that of handling 
general restrictions on the form of the covariance matrix. For example, it is easy to 
imagine that a particular linear combination of variables is known, and hence that 
there is a known eigenvector, with corresponding eigenvalue known to be zero in the 
covariance matrix. It would consequently be desirable to preserve such a structure in 
any adjusted matrices. 
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Also, two variables may be known to be independent for logical reasons and this 
would correspond to a zero in the covariance matrix which should be preserved. 

Also, there may be a particular conditional independence present which is known 
to be the case for logical reasons. This will correspond to a zero in the inverse 
of the covariance matrix, which is required to be preserved. For example, if belief 
specification has been based upon a graphical model, there may be some arcs which 
one feels should not be introduced. 

I imagine that the best way of handling such restrictions would be to choose a 
projection space which makes preserving such properties most natural. For example, 
an eigen-decomposition may be most appropriate for a known linear combination. The 
obvious element-wise decomposition demonstrated in Chapter 4 is easily modified to 
handle a known orthogonality — decompose all matrices to the one component level, 
and then neglect to include sample matrices corresponding to the known orthogonality 
in the projection space. In a similar vein, some kind of Choelesky or inverse matrix 
decomposition may prove to be the way to handle conditional orthogonalities. 

Again, there is clearly a great deal of important work to be done in this area, but 
progress looks quite possible. 

7.3 Diagnostics 

Matrix adjustment diagnostics still require attention. In the last chapter, some theory 
regarding general Bayes linear diagnostics was developed, but there is still work to 
be done. In general, without the concept of a unique bearing derived from a Riesz 
functional representation, the Bayes linear concept of the path correlation (Goldstein 
1988b) no longer makes a lot of sense. It is still possible to define a path correlation 
as a ratio of sizes of the various partial adjustments, but interpretation is now much 
less clear. 



Chapter 8 

Summary and conclusions 

8.1 Summary 

Quantifying relationships between variables is of fundamental importance in subjec- 
tive statistical inference. However, there are many difficulties associated even with 
learning about covariances. It is often difficult to make prior covariance specifica- 
tions, and usually even harder to make the statements about the uncertainty in these 
covariance statements which are required in order to learn about the covariance state- 
ments from data. Further, a covariance structure is more than just a collection of 
random quantities, so we should aim to analyse such structures in a space where they 
live naturally. In this thesis, such an approach was developed and illustrated, based 
around a geometric representation for variance matrices and exploiting second-order 
exchangeability specifications for them. 

All authors who have considered the problem of covariance matrix revision seem to 
have come to the conclusion that it is such a difficult problem that they are prepared 
to make whatever distributional assumptions necessary in order to make the analysis 
as simple as possible. The distributional assumptions they make are usually such that 
expectations and conditional expectations have desirable linearity properties, which 
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simplify the problem. In this thesis, no such distributional assumptions were made, 
but exactly those sorts of linearity properties were imposed and exploited. 

Specifications made for the scalar components of random matrices can be used as 
a basis for a Bayes linear analysis of the covariance structures. However, for large 
matrices, the number of quantities involved in the adjustments will be prohibitively 
large. It is therefore very desirable to consider matrices in a space where they may 
be treated as a single object, greatly reducing the specification burden. 

There is a common form of symmetry which often arises amongst ordered vec- 
tors of random quantities. It is essentially just a slightly weaker concept than that 
of (second-order) exchangeability. The covariance structure is invariant under arbi- 
trary translations and reflections of the ordering, and the auto-correlation function 
becomes constant after some distance, n. Ordered vectors with this property were 
called, second-order n-step exchangeable. This same symmetry also occurs, under the 
same sorts of circumstances, for collections of random matrices in a random matrix 
inner-product space. Hence, a concept of n-step exchangeability which was sufficiently 
general that it was also valid for spaces of matrices was developed, and a represen- 
tation theorem analagous to that for second-order exchangeability was derived. The 
representation theorem provides a simple way of decomposing variation for n-step 
exchangeable quantities into a part which is identifiable by the data, and a residual 
part, for which data is uninformative via linear fitting. 

Just as there are many advantages to making expectation primitive, and speci- 
fying expectations directly, so there are with matrix inner-products. A scheme for 
elicitation based upon graphical modelling of the relationships between matrices, and 
specification of uncertainty and uncertainty reduction could be used in a way very 
similar to that often used for random scalars. In this way, the specification burden 
will be vastly reduced. Given a problem involving just a few (possibly large) matrices. 
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all that will be required in order to carry out a basic analysis is a specification for 
the inner-product between every pair of matrices, rather than between every pair of 
scalars of which they are comprised. 

Analysing matrices in a space where they live naturally not only has great aesthetic 
appeal, but is very powerful and illuminating in practice. Working in this space 
simplifies the handling of large matrices, by reducing the number of quantities involved 
and summarising effects over the whole covariance structure. For the same reasons, 
diagnostic information about adjusted beliefs is easier to interpret. Structures may 
be decomposed as much or as little as is desired. 

This approach allows us to learn about collections of covariance structures, and 
examine their relationships. It generalises the "element by element" approach to 
revision, which can be viewed as taking place in a subspace of the larger space. 
Exchangeability representations lie at the heart of the methodology: all of our speci- 
fications are over observables, or quantities constructed from observables, rather than 
artificial model parameters, and no distributional assumptions for the data or the 
prior need be made. 

This approach to covariance estimation was applied to the development of a 
methodology for the revision of the underlying covariance structures for a dynamic 
linear model, using Bayes linear estimators for the covariance matrices based on sim- 
ple quadratic observables. This was done by constructing an inner-product space of 
random matrices containing both the underlying covariance matrices and observables 
predictive for them. Bayes linear estimates for the underlying matrices followed by 
orthogonal projection. 

Good forecasting requires careful updating of the covariances within the time 
series structure. Informally, the degree of shrinkage between the prior and the data 
requires updating, and relationships between variables must be properly taken into 
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account. Taking a matrix object approach greatly simplifies the problem by reducing 
it's dimensionality. This is important for both simplifying belief specification and 
belief adjustment, and also for interpretation of the structure of the adjustment and 
accompanying diagnostics. There are also the general advantages of the Bayes linear 
approach; namely of allowing complete flexibility for the prior specifications, without 
placing distributional restrictions on the data or model components. 

General a priori and a posteriori analysis of Bayes linear statistical problems has 
been shown to be possible within a single framework, via analysis of the eigenstructure 
of the belief transform and the observed belief transform. This will allow practical 
object-oriented implementations of the Bayes linear methodology which are not re- 
stricted to a particular type of random entity, making analysis of complex problems 
with unusual objects possible using standard procedures. 

Some of the work from this thesis is beginning to appear in the literature. In 
Wilkinson and Goldstein (1995a) we briefly describe a matrix inner-product, decom- 
positions of covariance matrices, and methods for learning about a covariance matrix 
for exchangeable random vectors. In Wilkinson and Goldstein (1995b) we discuss 
applications to covariance matrix revision for multivariate time series dynamic linear 
models, and the n-step exchangeability representation theorem. 

8.2 Conclusions 

Genuine subjective revision of belief for covariance matrices is a very difficult, but 
important problem. The methodology detailed in this thesis represents a useful contri- 
bution towards understanding the problem, and important methodological advances 
for carrying out revision in certain situations. However, I do not claim to have all 
of the answers — there are several important outstanding questions which still need 
addressing, and some of these are outlined in Chapter 7. The other important func- 
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tion of this thesis was to demonstrate the power and flexibihty of the Bayes linear 
methodology. It is impressive that the theory coped so well with the introduction of 
non-scalar quantities. No modification or re-interpretation of the theory was required 
in order to consider linear spaces of random matrices, and to carry out adjustments. 
Further, it is of considerable interest that the Bayes linear diagnostic theory required 
a generalisation before being applicable to spaces of matrices. The development of 
the work for this thesis has highlighted a necessity to re-evaluate the theory sur- 
rounding general Bayes linear adjustment diagnostics. The general theory for Bayes 
linear diagnostics presented in Section 6.3 is of considerable importance and interest 
independently of the matrix adjustment theory developed in this thesis. 

On completing this thesis, I feel more strongly than ever that the Bayes linear 
approach to subjective statistical inference is currently the best and most natural 
approach. It is however, equally clear that the theory is still in it's infancy, and 
that much work needs to be done. I feel that the approach to covariance estimation 
contained in this thesis captures very well the problems associated with covariance 
estimation, namely, the difficulties of belief specification, simplification, and ensuring 
sensible coherent revisions. By stripping away arbitrary distributional assumptions, 
the inherent problems and difficulties, often obscured by distributional and compu- 
tational issues, are revealed. Covariance estimation is a problem which I feel will be 
with us for some time to come. 



Appendix A 



Covariances between sample 
covariances 

This appendix is concerned with the derivation of equations (2.7), (2.8) and (2.9). 
Recall that we had exchangeable vectors, with exchangeable decomposition 

Xk = M + Rk (A.l) 

(2.1), and that the quadratic products of the residuals were considered exchangeable, 
so that 

RkRj = V + Uk (A.2) 

(2.4). The formation of a sequence of sample covariance matrices, each based upon 
n observations of the series, was considered. Now, from (2.6), we have 

1 " 

s, = -—rT.iK- R'){K - R'f (A.3) 

^ ^ w=l 

using an obvious extension of notation. Now using the notation Rf^ for the z*'* element 
of Rl, and the notation Vij for the {ijjy^ element of V, we have the following simple 
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results. 
Lemma 2 



Cov{V,j,RlRl,) = Cov{V,j,ViJ (A.4) 

Coy{V,„RlRl.) = lcov(y,„V^™) (A.5) 

Coy{V,j, RlRl,) = lcov(y,,,V^„) (A.6) 

CoY{Vij, RlRl.) = ^CoY{Vij,Vim) (A.7) 



yi,j,k,l,m,q, where ■ denotes the sample mean of the n cases. Also, 





Cov{V,,,Vim) + Covm„UU) 


(A.8) 


Gov {RikR'jk : Rtk^m- ) = 


^Cov{V,„ViJ + ^Covm„ULk) 


(A.9) 


GoviRikR'jk^ Rl^m-) = 


-Cov{Vij, Vim) + ^Cov(L'7^.;t, 67^^ 


(A.IO) 




^Cov(y,,, ViJ + ^Cov([/^.„ UL,) 


(A.ll) 


CoY{RlRl,RlRl) = 


^Cov(y,„ ViJ + ^Cov([/;^.„ c/L,) 


(A.12) 


Cov{RlRl,RlRl.) = 


^Cov(y,„ V^™) + ^Cov([/^.„ c/L,) 


(A.13) 



yi,j,k,l,m,q. Also, 









(A.14) 


^O'^iRikRjk^ Rln^m-) 


= -Cov(y,„v^„) 

n 




(A.15) 




= -Cow {Vij, Vim) ^ 

n 


■4cov(C/f.;t,C/f„J 


(A.16) 


'^^^{RtkR]-^R^lpKn) 


= ^Co^Vij^Vlm) 




(A.17) 


Co^iRlRl^RlRl) 


= ^G0w{Vj,Vlm)- 


f ^Cov(C/^.„[/f„,) 


(A.18) 


Cov{RlRl,RlRl.) 


= ^COV {Vij, Vim) - 


f ^Cov(c/;^.„[/f„,) 


(A.19) 
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□ 

Now, using the notation s\j for the element of Sq, we have the following 

results. 

Theorem 4 

Cov(y,„.L) = Cov(y,„v^„) (A.20) 
Cov(4,sL) = CoYiVi^Vij (A.21) 

Cov(4,.L) = Cov(V;„V^^) + ^^^^|^ (A.22) 

Proof 



1 " 

Cov(F,„.L) = -— rECov(F,„(i?f,-i?f.)«,-i?L)) (A.23) 
1 " 

- 5: Cov(F,„ - - + RfRl.) (A.24) 



1 



TEf^)cov(F,,-,V^„) (A.25) 

^ 1,-1 \ n- / 



= Cov(Fi,-,V;„) (A.26) 

(A.25) follows using equations (A.4) to (A.7). This gives (A.20). (A.21) and (A.22) 
can be derived similarly. □ 

Vectorising (A.20), (A.21) and (A.22) gives (2.7), (2.8) and (2.9). 



Appendix B 

Using REDUCE to assist DLM 
quadratic covariance calculations 



The covariance calculations for the quadratic products of one and two step differences, 
given in Section 5.3.4, were calculated using the REDUCE computer algebra system, 
described in Rayna (1987). Also, the precise form of the matrix inner product for the 
example discussed in Chapter 5 was deduced using the same REDUCE script. The 
following script was used to do all of the calculations required. 



% reduce program 

°/o for covariance calculations 

operator cov , ex,x,xx , a,r , v , s , va, sa; 

for all j ,k 

let cov(j ,k)=ex(j*k)-ex(j)*ex(k) ; 
for all j 

let ex(-j )=-ex( j ) ; 
for all j ,k 

let ex(j+k)=ex(j )+ex(k) ; 
for all j 

let ex(2*j)=2*ex(j) ; 
for all j 

let ex(4*j )=4*ex(j ) ; 
for all j 

let ex(j/2)=ex(j)/2; 
for all j 

let ex(j/4)=ex(j)/4; 

% model 

for all j ,tl 

let x(j ,tl)=a(j ,tl)+r(j ,tl)-r(j ,tl-l) ; 
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for all j ,tl 

let xx(j ,tl)=a(j ,tl)+a(j ,tl-l)+r(j ,tl)-r(j ,tl-2) ; 
for all j ,tl 

let ex(a(j ,tl))=0; 
for all j ,tl 

let ex(r(j ,tl))=0; 
for all j ,k,tl,t2 

let ex(a(j ,t2)*r(k,tl))=0; 
for all j ,k,tl 

let a(j ,tl)*a(k,tl)=va(j ,k)+sa(j ,k,tl) ; 
for all j ,k,tl 

let ex(sa(j ,k,tl))=0; 
for all j ,k,l,m,tl 

let ex(sa(j ,k,tl)*va(l,m))=0; 
for all j ,k,tl 

let r(j ,tl)*r(k,tl)=v(j ,k)+s(j ,k,tl) ; 
for all j ,k,tl 

let ex(s(j ,k,tl))=0; 
for all j ,k,l,m,tl 

let ex(s(j ,k,tl)*v(l,m))=0; 



for all j ,k,l,m 

let ex(v(j ,k)*va(l,m))=ex(va(l,m))*ex(v(j ,k)) ; 
for all j ,k,l,m,tl 

let ex(va(l,m)*s(j ,k,tl))=0; 
for all j ,k,l,m,tl 

let ex(sa(l,m,tl)*v(j ,k))=0; 
for all j ,k,l,m,tl,t2 

let ex(sa(l,m,t2)*s(j ,k,tl))=0; 



for all j,k,tl,t2 such that tl neq t2 

let ex(r(j ,tl)*r(k,t2))=0; 
for all j,k,tl,t2 such that tl neq t2 

let ex(a(j ,tl)*a(k,t2))=0; 
for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(s(j ,k,tl)*s(l,m,t2))=0; 
for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(sa(j ,k,tl)*sa(l,m,t2))=0; 



'/, simplifications 
for all j ,k,l,m,tl,t2 

let ex(a(k,tl)*r(j , t2) *va(l ,m) )=0 ; 
for all j ,k,l,m,tl,t2 

let ex(a(m,tl)*r(j , t2) *v(k, 1) )=0 ; 
for all j ,k,l,m,tl,t2,t3 

let ex(a(m,tl)*r(j , t2) *s (k, 1 , t3) )=0 ; 
for all j ,k,l,m,tl,t2,t3 

let ex(a(k,tl)*r(j , t2) *sa(l ,m, tS) )=0 ; 

for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(r(j , tl) *r (k, t2) *v(l ,m) )=0 ; 
for all j ,k,l,m,tl,t2,t3 such that tl neq t2 

let ex(r(j , tl) *r (k, t2) *s (1 ,m, t3) )=0 ; 
for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(r(j , tl) *r (k, t2) *va(l ,m) )=0 ; 
for all j ,k,l,m,tl,t2,t3 such that tl neq t2 

let ex(r(j , tl) *r (k, t2) *sa(l ,m, t3) )=0 ; 
for all j ,k,l,m,tl,t2,t3 such that tl neq t2 

let ex(a(l,tl)*a(m,t2)*s(j ,k,t3))=0; 
for all j ,k, 1 ,m, tl , t2 , t3 such that tl neq t2 

let ex(a(l,tl)*a(m,t2)*sa(j ,k,t3))=0; 
for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(a(l,tl)*a(m,t2)*v(j ,k))=0; 
for all j ,k, 1 ,m, tl , t2 such that tl neq t2 

let ex(a(l,tl)*a(m,t2)*va(j ,k))=0; 

for all j ,k,l,m,tl,t2,t3,t4 

such that tl neq t2 and tl neq t3 and t2 neq t3 
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let ex(a(j , tl) *a(k, t2) *a(l , t3) *r (m, t4) )=0 ; 

for all j ,k,l,m,tl,t2,t3,t4 

such that tl neq t2 and t3 neq t4 

let ex(a(l,tl)*a(m,t2)*r(j , tS) *r (k, t4) )=0 ; 

for all j ,k,l,m,tl,t2,t3,t4 

such that t2 neq t3 and t3 neq t4 and t2 neq t4 
let ex(a(m,tl)*r(j , t2) *r (k, t3) *r (1 , t4) )=0 ; 

for all j ,k,l,m,tl,t2,t3,t4 

such that tl neq t2 and tl neq t3 and tl neq t4 
and t2 neq t3 and t2 neq t4 and t3 neq t4 
let ex(r(j , tl) *r (k, t2) *r (1 , t3) *r (m, t4) )=0 ; 

for all j ,k,l,m,tl,t2,t3,t4 

such that tl neq t2 and tl neq t3 and tl neq t4 
and t2 neq t3 and t2 neq t4 and t3 neq t4 
let ex(a(j , tl) *a(k, t2) *a(l , t3) *a(m, t4) )=0 ; 

% expressions 



% covariances for the one step diffs 



11 
12 
13 
14 
15 
16 



=cov(va(j ,k) ,x(l,tl)*x(m,tl)) ; 
=cov(v(j ,k) ,x(l,tl)*x(m,tl)) ; 
=cov(x(j ,tl)*x(k,tl) ,x(l,tl)*x(m,tl)) ; 
=cov(x(j ,tl)*x(k,tl) ,x(l,tl-l)*x(m,tl-l)) 
=cov(x(j ,tl)*x(k,tl) ,x(l,tl-2)*x(m,tl-2)) 
=cov(x(j ,tl)*x(k,tl) ,x(l,tl-3)*x(m,tl-3)) 



% covariances for the 2-step diffs 



111 
112 
113 
114 
115 
116 
117 



=cov(va(j ,k) ,xx(l,tl)*xx(m,tl)) ; 

=cov(v(j ,k) ,xx(l,tl)*xx(m,tl)) ; 

=cov(xx(j ,tl)*xx(k,tl) ,xx(l,tl)*xx(m,tl)) ; 

=cov(xx(j ,tl)*xx(k,tl) ,xx(l,tl-l)*xx(m,tl-l)) 

=cov(xx(j ,tl)*xx(k,tl) ,xx(l,tl-2)*xx(m,tl-2)) 

=cov(xx(j ,tl)*xx(k,tl) ,xx(l,tl-3)*xx(m,tl-3)) 

=cov(xx(j ,tl)*xx(k,tl) ,xx(l,tl-4)*xx(m,tl-4)) 



'/, covariances between the one 
123:=cov(x(j ,tl)*x(k,tl) ,xx(l, 
124:=cov(x(j ,tl)*x(k,tl) ,xx(l, 
125:=cov(x(j ,tl)*x(k,tl) ,xx(l, 
126:=cov(x(j ,tl)*x(k,tl) ,xx(l, 
127:=cov(x(j ,tl)*x(k,tl) ,xx(l, 



and 2 step diffs 

tl)*xx(m,tl)) ; 

tl-l)*xx(m,tl-l)) 

tl-2)*xx(m,tl-2)) 

tl-3)*xx(m,tl-3)) 

tl-4)*xx(m,tl-4)) 



134 
135 
136 
137 



=cov(xx(j ,tl)*xx(k,tl) ,x(l,tl-l)*x(m,tl-l)) 
=cov(xx(j ,tl)*xx(k,tl) ,x(l,tl-2)*x(m,tl-2)) 
=cov(xx(j ,tl)*xx(k,tl) ,x(l,tl-3)*x(m,tl-3)) 
=cov(xx(j ,tl)*xx(k,tl) ,x(l,tl-4)*x(m,tl-4)) 



'/, actual number substitutions 



for 


all 


j 












let 


ex(va(j ,j))=4; 






for 


all 


j.k 


such that j neq k 










let 


ex(va(j ,k) )=1 ; 






for 


all 


j 












let 


ex(v(j , j))=36; 






for 


all 




such that j neq k 










let 


ex(v(j ,k))=-4; 






for 


all 


j 












let 


ex(va(j,j)-2)= (1 


5)-2 


+ (4)-2; 


for 


all 


j.k 


such that j neq k 










let 


ex(va(j ,k)-2)= (0 


75)- 


2 + (l)-2 


for 


all 


j 












let 


ex(v(j,j)-2)= (5)- 


2 + 


(36)-2; 
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for all j,k such that j neq k 

let ex(v(j ,k)-2)= (l)-2 + (-4)-2; 

for all j,k such that j neq k 

let ex(va(j ,j)*va(k,k))=0.2*l + (4)~2; 
for all j,k such that j neq k 

let ex(v(j , j)*v(k,k))= 4 + (36)"2; 

for all j ,tl 

let ex(s(j , j .tl)"2)=(2500) ; 
for all j ,k,tl such that j neq k 

let ex(s(j ,k,tl)-2)=(1000) ; 
for all j ,tl 

let ex(sa(j , j ,tl)"2)=(30) ; 
for all j,k,tl such that j neq k 

let ex(sa(j ,k,tl)-2)=(15) ; 

% index ordering 

for all j ,k such that j neq k and ordp(j ,k) 

let v(k, j)=v(j ,k) ; 
for all j ,k such that j neq k and ordp(j ,k) 

let va(k, j )=va( j ,k) ; 
for all j ,k,tl such that j neq k and ordp(j ,k) 

let s(k, j ,tl)=s(j ,k,tl) ; 
for all j ,k,tl such that j neq k and ordp(j ,k) 

let sa(k, j ,tl)=sa(j ,k,tl) ; 

% just want diagonal terms 
let l=j; 
let m=k; 

% expressions 



Idl 


=11 


ld2 


=12 


ld3 


=13 


ld4 


=14 


ld5 


=15 


ld6 


=16 



Idll 


=111; 


ldl2 


=112; 


IdlS 


=113; 


ldl4 


=114; 


ldl5 


=115; 


ldl6 


=116; 


ldl7 


=117; 


ld23 


=123; 


ld24 


=124; 


ld25 


=125; 


ld26 


=126; 


ld27 


=127; 


ld34 


=134; 


ld35 


=135; 


ld36 


=136; 


ld37 


=137; 



let k=j ; 

lsl:=ll; 
ls2:=12; 
ls3:=13; 
ls4:=14; 
ls5:=15; 
ls6:=16; 

lsll:=lll; 
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lsl2 


=112 


lsl3 


=113 


lsl4 


=114 


lsl5 


=115 


lsl6 


=116 


lsl7 


=117 


ls23 


=123 


ls24 


=124 


ls25 


=125 


ls26 


=126 


ls27 


=127 


ls34 


=134 


ls35 


=135 


ls36 


=136 


ls37 


=137 



% matrix expressions 

on bigf loat ,numval ; 
precision 20; 

n:=6; 

c01:=n*(5)"2 + n*(n-l)*(l)-2; 
c02:=n*(1.5)"2 + n*(n-l) *(0 . 75) "2 ; 
c03:=0; 

c04:=n*ls3 + n*(n-l)*ld3; 
c05:=n*ls4 + n*(n-l)*ld4; 
c06:=n*ls5 + n*(n-l)*ld5; 



c07:=n*ls2 + n*(n-l)*ld2; 
c08:=n*lsl + n*(n-l)*ldl; 



c09 


=n*lsl3 


+ 


n*(n- 


l)*ldl3 


clO 


=n*lsl4 


+ 


n*(n- 


l)*ldl4 


cll 


=n*lsl5 


+ 


n*(n- 


l)*ldl5 


cl2 


=n*lsl6 


+ 


n*(n- 


l)*ldl6 


cl3 


=n*lsl2 


+ 


n*(n- 


l)*ldl2 


cl4 


=n*lsll 


+ 


n*(n- 


l)*ldll 


cl5 


=n*ls23 


+ 


n*(n- 


l)*ld23 


cl6 


=n*ls34 


+ 


n*(n- 


l)*ld34 


cl7 


=n*ls24 


+ 


n*(n- 


l)*ld24 


cl8 


=n*ls35 


+ 


n*(n- 


l)*ld35 


cl9 


=n*ls25 


+ 


n*(n- 


l)*ld25 


c20 


=n*ls26 


+ 


n*(n- 


l)*ld26 



% now do expectations 

cc01:=4; '/, ex(va(i,j)) for i=j 

cc02;=l; '/, ex(va(i,j)) for i neq j 

cc03:=36; '/, ex(v(i,j)) for i=j 

cc04;=-4; '/, ex(v(i,j)) for i neq j 

cc05:=cc01 + 2*cc03; 
cc06:=cc02 + 2*cc04; 
cc07:=2*(cc01 + cc03) ; 
cc08:=2*(cc02 + cc04) ; 

cccl:=n*cc01 + n*(n-l)*cc02; ex(va) 

ccc2:=n*cc03 + n*(n-l)*cc04; Z ex(v) 

ccc3:=n*cc05 + n*(n-l)*cc06; % ex(x') 

ccc4:=n*cc07 + n*(n-l)*cc08; '/, ex(x") 

out "datavec .bd" ; 
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write "Scovvec"; 
write cOl; 
write c02; 
write c03; 
write c04; 
write c05; 
write c06; 
write c07; 
write c08; 
write c09; 
write clO; 
write cll; 
write cl2; 
write cl3; 
write cl4; 
write cl5; 
write cl6; 
write cl7; 
write cl8; 
write cl9; 
write c20; 
write "8meanvec"; 
write cccl; 
write ccc2; 
write ccc3; 
write ccc4; 
shut "datavec .bd" ; 

system "rep datavec. bd ©gauss :bd"; 
bye; 

The first part of the script defines the properties of expectation and covariance. 
The next part sets up the model. Following this, simplifications in the form of known 
orthogonalities are given. The next part of the script produces the algebraic form of 
the covariances which we are interested in. The remainder of the script substitutes 
in the belief specifications made for the example, in order to deduce the covariances 
over the quadratic products, and then for the matrix inner product. Note that the 
covariances for the matrix inner-product are the constant- adjusted versions, as these 
are what is required by [B/D]. Running this script produced the following output. 

Using directory /usr/local/reduce/current as Reduce root 
REDUCE 3.4.1, 15-Jul-92 ... 



1 




1 




2 




2 




3 




3 




4 




4 




5 




5 




6 




6 




7 




7 




8 
9 




8 
9 




10 


10 


11 


11 


12 


12 


13 


13 


14 


14 


15 


15 


16 


16 


17 


17 
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18 


18 






19 


19 






20 


20 


20 




21 


21 






22 


22 






23 


23 






24 


24 


24 




25 


25 






26 


26 






27 


27 






28 


28 


28 


28: 


29 


29 






30 


30 






31 


31 






32 


32 


32 




33 


33 






34 


34 






35 


35 






36 


36 






37 


37 






38 


38 






39 


39 






40 


40 


40 


40 




41 


41 


41 


41 




42 


42 


42 


42 




43 


43 


43 


43 


43 


44 


44 


44 


44 


44 


45 


45 


45 


45 


45 



LI := EX(VA(J,K)*VA(M,L)) - EX(VA(J,K))*EX(VA(M,L)) 

46: 

L2 := 2*( - EX(V(J,K))*EX(V(M,L)) + EX(V( J ,K) *V(M,L) ) ) 
47: 

L3 := - 4*EX(V(K,J))*EX(V(M,L)) + 2*EX(V(L , J) ) *EX(VA(M,K) ) 
+ 2*EX(V(L,K))*EX(VA(M,J)) + 2*EX(V(M, J) )*EX(VA(L,K) ) 
+ 2*EX(V(M,K))*EX(VA(L,J)) + EX(S(K,J,T1 - 1)*S(M,L,T1 - 1)) 
+ EX(S(K,J,T1)*S(M,L,T1)) + 4*EX(V(K , J)*V(M,L) ) 
+ 2*EX(V(L,J)*V(M,K)) + 2*EX(V(L,K)*V(M, J)) 
+ EX(VA(K, J)*VA(M,L)) + EX(SA(K , J ,T1) *SA(M,L,T1) ) 

- EX(VA(K, J))*EX(VA(M,L)) 

48: 

L4 := - 4*EX(V(K,J))*EX(V(M,L)) + EX(S(K,J,T1 - 1)*S(M,L,T1 - 1)) 
+ 4*EX(V(K,J)*V(M,L)) + EX(VA(K, J)*VA(M,L)) 

- EX(VA(K, J))*EX(VA(M,L)) 

49: 

L5 := - 4*EX(V(K,J))*EX(V(M,L)) + 4*EX(V(K , J)*V(M,L) ) 
+ EX(VA(K, J)*VA(M,L)) - EX(VA(K, J))*EX(VA(M,L)) 

50: 

L6 := - 4*EX(V(K,J))*EX(V(M,L)) + 4*EX(V(K , J)*V(M,L) ) 
+ EX(VA(K, J)*VA(M,L)) - EX(VA(K, J))*EX(VA(M,L)) 
51: 51: 51: 

Lll := 2*(EX(VA(J,K)*VA(M,L)) - EX(VA( J ,K) )*EX(VA(M,L) ) ) 
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52: 

L12 := 2*( - EX(V(J,K))*EX(V(M,L)) + EX(V( J ,K) *V(M,L) ) ) 
53: 

LIS := - 4*EX(V(K, J))*EX(V(M,L)) + 4*EX(V(L, J) )*EX(VA(M,K) ) 
+ 4*EX(V(L,K))*EX(VA(M, J)) + 4*EX(V(M, J) )*EX(VA(L,K) ) 
+ 4*EX(V(M,K))*EX(VA(L, J)) + EX(S(K,J,T1 - 2)*S(M,L,T1 - 2)) 
+ EX(S(K, J,T1)*S(M,L,T1)) + 4*EX(V(K, J)*V(M,L)) 
+ 2*EX(V(L, J)*V(M,K)) + 2*EX(V(L,K)*V(M,J)) 
+ 4*EX(VA(K,J)*VA(M,L)) + 2*EX(VA(L, J)*VA(M,K)) 
+ 2*EX(VA(L,K)*VA(M,J)) + EX(SA(K,J,T1 - 1)*SA(M,L,T1 - 1)) 
+ EX(SA(K,J,T1)*SA(M,L,T1)) - 4*EX(VA(K, J) )*EX(VA(M,L) ) 

54: 

L14 := - 4*EX(V(K, J))*EX(V(M,L)) + 4*EX(V(K, J)*V(M,L)) 

+ 4*EX(VA(K,J)*VA(M,L)) + EX(SA(K,J,T1 - 1)*SA(M,L,T1 - 1)) 

- 4*EX(VA(K,J))*EX(VA(M,L)) 

55: 

L15 := - 4*EX(V(K, J))*EX(V(M,L)) + EX(S(K,J,T1 - 2)*S(M,L,T1 - 2)) 
+ 4*EX(V(K, J)*V(M,L)) + 4*EX(VA(K,J)*VA(M,L)) 

- 4*EX(VA(K,J))*EX(VA(M,L)) 

56: 

L16 := 4*( - EX(V(K, J))*EX(V(M,L)) + EX(V(K, J)*V(M,L)) 

+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

57: 

L17 := 4*( - EX(V(K, J))*EX(V(M,L)) + EX(V(K, J)*V(M,L)) 

+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

58: 58: 58: 

L23 := - 4*EX(V(K, J))*EX(V(M,L)) + EX(V(L, J) )*EX(VA(M,K) ) 
+ EX(V(L,K))*EX(VA(M, J)) + EX(V(M, J))*EX(VA(L,K)) 
+ EX(V(M,K))*EX(VA(L, J)) + EX(S(K , J ,T1) *S(M,L ,T1) ) 
+ 4*EX(V(K, J)*V(M,L)) + 2*EX(VA(K,J)*VA(M,L)) 
+ EX(SA(K,J,T1)*SA(M,L,T1)) - 2*EX(VA(K, J) )*EX(VA(M,L) ) 

59: 

L24 := - 4*EX(V(K, J))*EX(V(M,L)) + EX(S(K,J,T1 - 1)*S(M,L,T1 - 1)) 
+ 4*EX(V(K, J)*V(M,L)) + 2*EX(VA(K,J)*VA(M,L)) 

- 2*EX(VA(K,J))*EX(VA(M,L)) 

60: 

L25 := 2*( - 2*EX(V(K, J))*EX(V(M,L)) + 2*EX(V(K, J)*V(M,L)) 
+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

61: 
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L26 := 2*( - 2*EX(V(K, J))*EX(V(M,L)) + 2*EX(V(K, J)*V(M,L)) 
+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

62: 

L27 := 2*( - 2*EX(V(K, J))*EX(V(M,L)) + 2*EX(V(K, J)*V(M,L)) 
+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

63: 63: 

L34 := - 4*EX(V(K, J))*EX(V(M,L)) + EX(V(L, J) )*EX(VA(M,K) ) 
+ EX(V(L,K))*EX(VA(M, J)) + EX(V(M, J))*EX(VA(L,K)) 
+ EX(V(M,K))*EX(VA(L, J)) + EX(S(K,J,T1 - 2)*S(M,L,T1 - 2)) 
+ 4*EX(V(K, J)*V(M,L)) + 2*EX(VA(K,J)*VA(M,L)) 
+ EX(SA(K,J,T1 - 1)*SA(M,L,T1 - 1)) 

- 2*EX(VA(K,J))*EX(VA(M,L)) 

64: 

L35 := - 4*EX(V(K, J))*EX(V(M,L)) + EX(S(K,J,T1 - 2)*S(M,L,T1 - 2)) 
+ 4*EX(V(K, J)*V(M,L)) + 2*EX(VA(K,J)*VA(M,L)) 

- 2*EX(VA(K,J))*EX(VA(M,L)) 

65: 

L36 := 2*( - 2*EX(V(K, J))*EX(V(M,L)) + 2*EX(V(K, J)*V(M,L)) 
+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 

66: 

L37 := 2*( - 2*EX(V(K, J))*EX(V(M,L)) + 2*EX(V(K, J)*V(M,L)) 
+ EX(VA(K,J)*VA(M,L)) - EX(VA(K, J) )*EX(VA(M,L) ) ) 



67 


67 


67 


68 


68 




69 


69 




70 


70 




71 


71 


71 


72 


72 




73 


73 




74 


74 




75 


75 


75 


76 


76 




77 


77 


77 


78 


78 




79 


79 




80 


80 




81 


81 


81 


82 


82 




83 


83 




84 


84 




85 


85 


85 


86 






87 


87 


87 



: 67: 67: 



: 81: 



9 

LDl := 

16 

88: 

LD2 := 2 
89: 
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83417 

LD3 := 

16 

90: 

16073 

LD4 := 

16 

91: 

73 

LD5 := 

16 

92: 

73 

LD6 := 

16 

93: 93: 

9 

LDll := — 



94: 

LD12 := 
95: 

LD13 := 
96: 

LD14 := 
97: 

LD15 := 
98: 

LD16 := 
99: 

LD17 := 



233031 
40 

85 
4 

4025 
4 

25 
4 

25 
4 



100: 100: 

10401 
LD23 := 



101: 

8041 
LD24 := 



102: 

41 

LD25 := 



103: 
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LD26 := 

8 

104: 

41 

LD27 := 

8 

105: 105: 

10401 

LD34 := 

8 

106: 

8041 

LD35 := 

8 

107: 

41 

LD36 := 

8 

108: 

41 

LD37 := 

8 

109: 109: 
110: 110: 
9 

LSI := — 
4 

111: 

LS2 := 50 
112: 

46273 

LS3 := 

4 

113: 

10409 

LS4 := 

4 

114: 

409 

LS5 := 

4 

115: 

409 

LS6 := 

4 

116: 116: 

9 

LSll := — 
2 

117: 

LS12 := 50 
118: 

LS13 := 12830 



APPENDIX B. REDUCE FOR COVARIANCE CALCULATIONS 



119: 

LS14 := 139 
120: 

LS15 := 2609 

121: 

LS16 := 109 
122: 

LS17 := 109 
123: 123: 

6421 

LS23 := 

2 

124: 

5209 

LS24 := 

2 

125: 

209 

LS25 := 

2 

126: 

209 

LS26 := 

2 

127: 

209 

LS27 := 

2 

128: 128: 

6421 

LS34 := 

2 

129: 

5209 

LS35 := 

2 

130: 

209 

LS36 := 

2 

131: 

209 

LS37 := 

2 

132: 132: 132: 132: 

*** Please use ROUNDED instead 

133: 
12 



134: 134: 
N := 6 



135: 
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COl := 180 
136: 

C02 := 30.375 
137: 

COS := 

138: 138: 

C04 := 225816.375 

139: 

C05 := 45750.375 
140: 

C06 := 750.375 

141: 141: 
C07 := 360 

142: 

C08 := 30.375 

143: 143: 

C09 := 251753.25 

144: 

CIO := 1471.5 
145: 

Cll := 45841.5 
146: 

C12 := 841.5 

147: 147: 
C13 := 360 

148: 

C14 := 60.75 

149: 149: 

C15 := 58266.75 

150: 

C16 := 58266.75 
151: 

C17 := 45780.75 
152: 

C18 := 45780.75 
153: 

C19 := 780.75 
154: 

C20 := 780.75 

155: 155: 155: 155: 
CCOl := 4 

156: 

CC02 := 1 
157: 

CC03 := 36 
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158: 

CC04 := -4 

159: 159: 
CC05 := 76 

160: 

CC06 := -7 

161: 

CC07 := 80 
162: 

CC08 := -6 

163: 163: 
CCCl := 54 

164: 

CCC2 := 96 
165: 

CCC3 := 246 
166: 

CCC4 := 300 

167: 167: 168: 169: 170: 171: 172: 173: 174: 175: 176: 177: 178: 
179: 180: 181: 182: 183: 184: 185: 186: 187: 188: 189: 190: 191: 
192: 193: 194: 

195: 



196: 196: 
Quitting 



First, the algebraic form of the covariances are given. The REDUCE operators 
v(-) and s(-) correspond to the matrices V,"^ and S'' respectively, and the operators 
va(0 and sa(0 correspond to the matrices V,^ and S'^ respectively. We may re- write 
the derived expressions in more conventional notation as follows. 



Coy{V^%,xlpxi^}) = Cov(l^-„0 (B.l) 



Cov(V^'i;, 4')xi\)) = 2Coy{V^„ F^,) (B.2) 
+ 4[E(V;pE(F-,)+E(l^^)E(T/-,) 
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(B.4) 

Cov(x],^)x(;\xg)_,)Xi\\_^)) = 4Cov(F,-^,0 + Cov(F,-,0 yi,j,k,l,t,s > 2 (B.5) 

Cov(V^-„ = 2Cov(\^-„ F-;) (B.6) 

Cov(\^?^, = 2Cov(\A'^„ F^;) (B.7) 

-2)' '^mi(t-2)) 

-1)' '^w(t-i)) 
+ 4[E(V;pE(F-,) + E(F,^)E(F-^.) 

(B.8) 
(B.9) 
(B.IO) 

Cov(xf = 4Cov(F,';,0+Cov(F,-,F-,) Vi, j, A;, i, . > 3 (B.ll) 

Coy{X^l^xi]!,xl^l^^xl^l^^^) = 4Cov(F,';,F^,) + 2Cov(V^-,F-,) yz,j,kj,t,s > 3 

(B.12) 

Cov(x],^)x(;\ xg)^2)^S+2)) = 4Cov(F,';, F^;) + 2Cov(F,^, F^,) + Cov(5,,.(i_2), 5™;(t-2)) 

(B.13) 

Cov(x]i^xi;UJli)^SVi)) = 2Cov(F,-,F-,) +4Cov(F,-^,F^J 

-1)' '^mi(t-l)) 

+ E(F,pE(F-,)+E(T^^)E(F-;.) 
+ E(F^^.)E(F,^') + E(F^,)E(F,-) 



(B.14) 



+ Cov(5^^,, 5;; J + Cov(5^,, 5- J ^ 
+ E(T^pE(F-,) + E(F/^)E(F,-.) 
+ E(F^,.)E(V^;^)+E(F^,)E(^-) 

Coy{X)',>Xl',>,Xi;i^^Xl^l^_^^) = 4Cov{V,% F^,) + 2Cov(F,-, F^,) + Cov{S^,.^,_,^, S^^,^,_,^) 

(B.16) 
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Cov(4^)4;\41^)Xi'),_^)) = 4Cov(F,-^,F^,) + 2Cov(F,-,0 yi,j,k,l,t,s > 2 

(B.17) 

Vectorising these equations gives the formulae from Section 5.3.4. The rest of the 
output is used as [B/D] input for the example adjustments. 
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